STAR spatial accelerator for sparse Transformer attention

arXiv.org · December 24, 2025 · ✓ verified

The authors propose STAR, an algorithm-hardware co-designed accelerator architecture for sparse attention in Transformer-based large language model inference under large-scale token parallelism (LTPP).

  • STAR introduces leading-zero-based sparsity prediction, distributed sorting, and a sorted updating FlashAttention mechanism with coordinated cross-stage tiling, reducing redundant computation, memory access, latency, and improving compute and energy efficiency versus existing dynamic sparsity accelerators and NVIDIA A100.
  • A dedicated STAR accelerator and a multi-core Spatial-STAR spatial architecture are evaluated, showing up to 9.2× speedup and 71.2× energy efficiency over A100, up to 16.1× energy and 27.1× area efficiency gains over state-of-the-art accelerators, and 20.1× throughput improvement for ultra-long sequence processing compared with a baseline spatial design.
Keep reading
Nordic data centers pioneer sustainable cooling and heat reuse atNorth · Jun 22 Data4 launches major European recruitment campaign for growth DATA4 Group · Jun 22 NVIDIA Rubin enables 45°C liquid-cooled AI data centers NVIDIA · Jun 22 Equinix trials hydrogen power units at Dublin data center Hydrogen Europe · Jun 19
Telborg · US Data Centers
Track the US data-center buildout — every day.

Real-time verified news and daily AI-written briefings, built from primary sources — power, grid, permits, land, financing. Start free.

Get Telborg Pro · $189/mo Get the daily briefing — free →