TetriServe: Efficient Step-level DiT Serving for Heterogeneous Images
arXiv.org
· January 19, 2026
· ✓ verified
The authors (Runyu Lu et al.) introduce TetriServe, a DiT serving system that implements step-level sequence parallelism and a round-based scheduler to improve SLO attainment for heterogeneous image-generation workloads.
- Main announcement: TetriServe implements step-level sequence parallelism and a round-based scheduling mechanism that (a) discretizes time into fixed rounds for tractable deadline-aware scheduling, (b) adapts parallelism at the step level to minimize GPU hour consumption, and (c) jointly packs requests to minimize late completions; evaluation shows up to 32% higher SLO attainment versus existing fixed-parallelism solutions without degrading image quality.
- Background and details: The paper targets inefficiencies in existing serving systems that use fixed degree sequence parallelism, which perform poorly on heterogeneous workloads (mixed resolutions and deadlines). The authors evaluate TetriServe on state-of-the-art DiT models, report GPU-utilization and SLO metrics, and provide implementation details (round-based scheduler, step-level adaptation, packing policy).