TRACE: Lossless Compression and Precision Scaling for CXL Bandwidth

arXiv.org · February 02, 2026 · ✓ verified

Rui Xie et al. (arXiv:2509.03377 v3) present TRACE, a device-internal layout and KV-specific transform that enables lossless compression and precision-proportional fetch to unlock effective CXL bandwidth for LLM inference.

  • Main announcement: TRACE preserves the unmodified CXL interface but changes the device-internal representation to a channel-major, disaggregated bit-plane layout and applies a KV-specific transform before compression; it enables precision-proportional fetch (reading only required bit-planes) and achieves lossless reductions of BF16 weight footprint by 25.2% and BF16 KV footprint by 46.9%, with per-layer KV ratios up to 2.69×.
  • Background and evaluation details: The paper reports system-modeling results where, once KV spills to CXL, GPT-OSS-120B-MXFP4 throughput at 128k tokens improves from 16.28 to 68.99 tok/s (4.24×); DRAMSim3 shows up to 40.3% lower DRAM access energy under plane-aligned fetch; a 7 nm SystemVerilog implementation sustains 256 GB/s device bandwidth and TRACE adds 7.2% area, 4.7% power, and 6.0% load-to-use latency relative to a CXL controller with generic inline lossless compression (evaluated at 2 GHz, 0.7 V).
Keep reading
Virginia Budget Prioritizes Data Centers Over Citizens’ Rights The Piedmont Environmental Council · Jun 23 Salute expands veteran-focused data center services across EMEA Salute · Jun 23 Invinity discusses vanadium flow batteries for long-duration storage Troutman Pepper Locke · Jun 23 UN chief lays out seven-point blueprint for clean energy transition UN | Climate Change · Jun 23
Founding Members — first 50 seats
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →

30-day full refund — no forms, cancel anytime.