Selective KV-Cache Sharing Mitigates Timing Side-Channels in LLMs
arXiv.org
· February 11, 2026
· ✓ verified
Kexin Chu et al. have published SafeKV, a system to enforce privacy while enabling selective KV-cache sharing for multi-tenant LLM inference (arXiv v2, revised 9 Feb 2026).
- Main announcement: SafeKV is introduced as a system-level co-design that integrates lightweight detection and isolation into the serving runtime to eliminate cross-tenant reuse of sensitive KV-cache blocks while recovering most performance benefits of global sharing; the paper reports TTFT overhead reduction vs full isolation up to 40.58% and throughput improvement up to 2.66x. The paper was first submitted 11 Aug 2025 (v1) and revised 9 Feb 2026 (v2) on arXiv.
- Technical details & implementation: The system uses a three-tier asynchronous detection pipeline (privacy classification decoupled from inference, supports streaming), a unified radix-tree-based memory manager with path compression and sensitivity-aware eviction for selective isolation, and an RDR-guided (Reuse Diversity Ratio) runtime safeguard to detect and bound residual leakage; evaluation is on large LLM backends and reports the quantitative performance numbers above.