AdaptCache KV-cache storage hierarchy for low-delay LLM serving

          arXiv.org
          
            · January 19, 2026
          
          · ✓ verified

The authors have published AdaptCache, a KV-cache native storage hierarchy for LLM serving; the paper was accepted at SOSP 2025 (Big Memory workshop) and revised on 15 Jan 2026.

Main announcement: The paper by Shaoting Feng et al. presents AdaptCache, a lossy KV-cache compression and placement system that selects compression algorithm, compression rate, and device placement per KV entry to maximize DRAM hits and minimise loading delay while preserving generation quality; reported results show 1.43–2.4x delay savings at the same quality and 6–55% quality improvements at the same delay across three tasks.
Background and details: The work addresses high loading delays when KV caches span DRAM and SSD, notes that prior DRAM+SSD approaches suffer because most hits come from slow SSD loads, and includes implementation/evaluation details (submission v1: 28 Aug 2025; revised v2: 15 Jan 2026); accepted at SOSP 2025 - BigMem workshop and distributed via arXiv (DOI: 10.48550/arXiv.2509.00105).

Read original source ↗