NVIDIA GB300 NVL72 boosts agentic AI performance and efficiency
NVIDIA
· February 16, 2026
· ✓ verified
NVIDIA announces that the GB300 NVL72 (Blackwell Ultra) platform delivers major performance and cost improvements for low-latency and long-context AI workloads.
- Main announcement: NVIDIA claims GB300 NVL72 (Blackwell Ultra) achieves up to 50x higher throughput per megawatt and up to 35x lower cost per token versus the NVIDIA Hopper platform (most pronounced at low latency). For long-context workloads (128,000-token inputs / 8,000-token outputs) NVIDIA states GB300 NVL72 delivers up to 1.5x lower cost per token vs GB200 NVL72; software stack improvements (TensorRT-LLM, Dynamo, Mooncake, SGLang) are cited as contributing factors.
- Background and supporting details: The article cites third-party performance data (SemiAnalysis, Signal65) and notes that major cloud providers — Microsoft, CoreWeave, and OCI (Oracle) — are deploying GB300 NVL72 in production for agentic coding and long-context workloads. It also references the future NVIDIA Rubin/Vera Rubin NVL72 platform, which NVIDIA says can deliver up to 10x higher throughput per megawatt vs Blackwell (and train some MoE models with ~1/4 the GPUs compared with Blackwell). This piece is primarily an announcement of NVIDIA performance claims and product positioning that references external analyses; it is not a research paper or independent audit.