AdaptCache KV-cache storage hierarchy for low-delay LLM serving

arXiv.org · January 19, 2026 · ✓ verified

The authors have published AdaptCache, a KV-cache native storage hierarchy for LLM serving; the paper was accepted at SOSP 2025 (Big Memory workshop) and revised on 15 Jan 2026.

  • Main announcement: The paper by Shaoting Feng et al. presents AdaptCache, a lossy KV-cache compression and placement system that selects compression algorithm, compression rate, and device placement per KV entry to maximize DRAM hits and minimise loading delay while preserving generation quality; reported results show 1.43–2.4x delay savings at the same quality and 6–55% quality improvements at the same delay across three tasks.
  • Background and details: The work addresses high loading delays when KV caches span DRAM and SSD, notes that prior DRAM+SSD approaches suffer because most hits come from slow SSD loads, and includes implementation/evaluation details (submission v1: 28 Aug 2025; revised v2: 15 Jan 2026); accepted at SOSP 2025 - BigMem workshop and distributed via arXiv (DOI: 10.48550/arXiv.2509.00105).
Keep reading
Nordic data centers pioneer sustainable cooling and heat reuse atNorth · Jun 22 Data4 launches major European recruitment campaign for growth DATA4 Group · Jun 22 NVIDIA Rubin enables 45°C liquid-cooled AI data centers NVIDIA · Jun 22 Equinix trials hydrogen power units at Dublin data center Hydrogen Europe · Jun 19
Telborg · US Data Centers
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →