NVIDIA Blackwell drives 4–10x reductions in AI token costs
NVIDIA
· February 12, 2026
· ✓ verified
NVIDIA announces that its Blackwell platform and partner inference providers are delivering major reductions in cost per token for open-source and mixed-model inference.
- Main announcement: NVIDIA Blackwell (and the GB200 NVL72 system) plus partner inference stacks (Baseten, DeepInfra, Fireworks AI, Together AI) deliver multi-fold cost reductions in token economics — examples include up to 10x lower cost per token for reasoning MoE models on GB200 NVL72 and reported improvements such as 4x (DeepInfra from $0.20→$0.05 per million tokens), 2.5x throughput-per-dollar (Baseten vs Hopper), and 90% inference cost drop (10x) for Sully.ai compared with its prior closed-source implementation.
- Background / implementation details: Partners run open-source models on NVIDIA Blackwell GPUs using optimizations such as NVFP4 low-precision, TensorRT-LLM, NVIDIA Dynamo, speculative decoding, caching and autoscaling; reported concrete operational outcomes include 30 million minutes returned to physicians (Sully.ai), 1.8 million waitlisted users in 24 hours and 5.6 million queries in a week (Sentient), and sub-400 ms voice response latency for Decagon. No new policy or regulatory announcement is made — the article summarizes vendor/customer deployments and performance results.