Energy Efficiency Sweet Spots in Production LLM Inference

arXiv.org · February 06, 2026 · ✓ verified

Hiari Pizzini Cavagna and co-authors present an analytical model that identifies LLM inference energy-efficiency “Sweet Spots” and validates it empirically.

  • Main announcement: The authors introduce an analytical model (based on Transformer compute and memory-access complexity) and validate it using TensorRT-LLM on NVIDIA H100 GPUs, across models ranging from 1B to 9B parameters (OPT, LLaMA, Gemma, Falcon, Qwen2, Granite); they test input/output lengths from 64 to 4096 tokens and report a mean MAPE of 1.79%. The paper is submitted 5 Feb 2026 and is to appear at ICPE 2026.

  • Background and details: The evaluation uses production-oriented inference tooling (TensorRT-LLM) and focuses on empirical energy consumption regimes (peak efficiency with short-to-moderate inputs and medium-length outputs). Full-text is available via PDF and HTML on arXiv; the submission is released under CC BY 4.0. No operational deployment timeline or commercial contracts are announced.

Keep reading
JUPITER exascale powers brain mapping, climate, 6G and quantum NVIDIA · Jun 22 NAIRR pilot accelerates scientific AI research with NVIDIA DGX NVIDIA · Jun 22 Eco Wave Power Uses NVIDIA AI To Harness Wave Energy NVIDIA · Jun 22 Nordic data centers pioneer sustainable cooling and heat reuse atNorth · Jun 22
Telborg · US Data Centers
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →