MatKV trades Compute for Flash Storage in LLM Inference

arXiv.org · December 30, 2025 · ✓ verified

Kun-Woo Shin et al. (Seoul National University and Samsung Electronics) propose MatKV, a system that precomputes and materializes key-value (KV) vectors of RAG objects on flash storage to avoid repeated GPU-based KV computation at inference time.

  • Main announcement: The authors introduce MatKV, which precomputes KVs, stores them on inexpensive, fast, power-efficient flash SSDs, and reuses them at inference, claiming that MatKV reduces both inference time and power consumption by half for RAG workloads; experiments use Hugging Face’s Transformers across state-of-the-art GPUs and flash SSDs. (Paper submitted 20 Dec 2025; Accepted for publication in ICDE 2026.)
  • Additional details / methods: The paper focuses on making the prefill phase efficient by materializing KVs; it demonstrates two optimizations: (1) concurrent decoding and KV loading where a GPU decodes while loading materialized KVs for the next instance to reduce load latency, and (2) enabling the use of low-end GPUs for decoding once KVs are loaded into GPU memory, with reported minimal impact on throughput and QA accuracy.
Keep reading
EU position for 11th EU‑Egypt Association Council meeting Council of the EU · Jun 16 EU establishes strengthened screening framework for foreign investments Council of the EU · Jun 16 Annex: EU Budget Performance and Priorities for 2025 Council of the EU · Jun 16 European Commission AMPR 2025: Internal Control and RRF Annexes Council of the EU · Jun 16
Telborg · US Data Centers
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →