CONCUR: Congestion-Based Concurrency Control for Agentic LLM Inference

arXiv.org · February 02, 2026 · ✓ verified

Qiaoling Chen et al. have introduced CONCUR, a congestion-based concurrency control layer for agentic batch LLM inference (arXiv submission v1, 30 Jan 2026).

  • Main announcement: The paper presents CONCUR, a lightweight control layer implementing agent-level admission control inspired by congestion control to bound aggregate GPU KV cache pressure, prevent middle-phase thrashing, and preserve execution continuity; reported throughput improvements are up to 4.09x on Qwen3-32B and 1.9x on DeepSeek-V3 (evaluated across large models and real-world agent workloads).
  • Background and details: The authors identify middle-phase thrashing as cache efficiency collapse caused by long-lived agents accumulating state; CONCUR adapts a cache-aware control algorithm to dynamically adjust the number of active agents using runtime cache signals, is compatible with existing LLM serving systems, and the work is available on arXiv (arXiv:2601.22705) under CC BY 4.0.
Keep reading
Nordic data centers pioneer sustainable cooling and heat reuse atNorth · Jun 22 Data4 launches major European recruitment campaign for growth DATA4 Group · Jun 22 NVIDIA Rubin enables 45°C liquid-cooled AI data centers NVIDIA · Jun 22 Equinix trials hydrogen power units at Dublin data center Hydrogen Europe · Jun 19
Telborg · US Data Centers
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →