SAIR: Cost-Efficient Multi-Stage ML Pipeline Autoscaling via LLM

arXiv.org · February 02, 2026 · ✓ verified

The authors (Jianchang Su et al.) announce the SAIR autoscaling framework for multi-stage ML inference pipelines.

  • Main announcement: SAIR uses an LLM as an in-context reinforcement learning controller, with Pareto-dominance reward shaping, surprisal-guided experience retrieval, and fine-grained GPU rate control via user-space CUDA interception; evaluated on four ML serving pipelines under three workload patterns and reports up to 50% P99 latency improvement, up to 97% effective cost reduction (under GPU rate-control assumptions), 86% bottleneck detection accuracy, and no offline training.
  • Context and details: Submitted to arXiv (v1) on 29 Jan 2026 by Jianchang Su and six co-authors; paper includes regret analysis decomposing error into retrieval coverage and LLM selection components; full-text available as PDF, HTML, and TeX source, and licensed under CC BY 4.0.
Keep reading
Virginia Budget Prioritizes Data Centers Over Citizens’ Rights The Piedmont Environmental Council · Jun 23 Salute expands veteran-focused data center services across EMEA Salute · Jun 23 Invinity discusses vanadium flow batteries for long-duration storage Troutman Pepper Locke · Jun 23 UN chief lays out seven-point blueprint for clean energy transition UN | Climate Change · Jun 23
Founding Members — first 50 seats
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →

30-day full refund — no forms, cancel anytime.