NVIDIA Blackwell drives 4–10x reductions in AI token costs

NVIDIA · February 12, 2026 · ✓ verified

NVIDIA announces that its Blackwell platform and partner inference providers are delivering major reductions in cost per token for open-source and mixed-model inference.

  • Main announcement: NVIDIA Blackwell (and the GB200 NVL72 system) plus partner inference stacks (Baseten, DeepInfra, Fireworks AI, Together AI) deliver multi-fold cost reductions in token economics — examples include up to 10x lower cost per token for reasoning MoE models on GB200 NVL72 and reported improvements such as 4x (DeepInfra from $0.20→$0.05 per million tokens), 2.5x throughput-per-dollar (Baseten vs Hopper), and 90% inference cost drop (10x) for Sully.ai compared with its prior closed-source implementation.
  • Background / implementation details: Partners run open-source models on NVIDIA Blackwell GPUs using optimizations such as NVFP4 low-precision, TensorRT-LLM, NVIDIA Dynamo, speculative decoding, caching and autoscaling; reported concrete operational outcomes include 30 million minutes returned to physicians (Sully.ai), 1.8 million waitlisted users in 24 hours and 5.6 million queries in a week (Sentient), and sub-400 ms voice response latency for Decagon. No new policy or regulatory announcement is made — the article summarizes vendor/customer deployments and performance results.
Keep reading
Nordic data centers pioneer sustainable cooling and heat reuse atNorth · Jun 22 Data4 launches major European recruitment campaign for growth DATA4 Group · Jun 22 NVIDIA Rubin enables 45°C liquid-cooled AI data centers NVIDIA · Jun 22 Equinix trials hydrogen power units at Dublin data center Hydrogen Europe · Jun 19
Telborg · US Data Centers
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →