NVIDIA software stack cuts token costs on Blackwell GPUs

NVIDIA · June 30, 2026 · ✓ verified

NVIDIA announces its inference software stack reduces token cost and increases throughput on Blackwell GPUs.

  • Main announcement: NVIDIA’s full-stack inference software for the Blackwell platform has reduced token costs by up to 5x on the DeepSeek V4 model in about one month, and combining system-level optimizations (disaggregated serving, large expert parallelism, NVFP4, multi-token prediction) can increase throughput by up to 20x. The blog cites partner results such as Baseten reporting up to 50% more tokens/sec, DigitalOcean / Hippocratic AI reporting ~30% higher inference throughput while maintaining sub-half-second time-to-first-response, and day-zero deployment recipes for vLLM and SGLang.
  • Background and implementation details: The announcement explains the stack connects three layers — Production Operation, Application Acceleration, and Infrastructure Access — and leverages open source frameworks (PyTorch, vLLM, SGLang) and runtimes (TensorRT-LLM, NVIDIA Dynamo). It references concrete software features and optimizations (DFlash speculative decode up to 15x throughput, NVLink interconnect, NVFP4 precision) and notes these improvements were observed in production-focused tests and partner deployments within a short timeframe (about a month).
Keep reading
South Africa welcomes Google Cloud Summit, emphasizes AI infrastructure Government of South Africa · Jul 01 Taiwan manufacturing sentiment rises on AI demand, easing geopolitics Overseas Community Affairs Council, Taiwan · Jul 01 Brookhaven Lab and AWS partner to connect AI data centers Stony Brook University · Jun 30 Global leaders endorse Montreal Action Plan on energy efficiency Government of Canada · Jun 30
Founding Members — first 50 seats
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →

30-day full refund — no forms, cancel anytime.