RollArc: Scaling Agentic RL Training via Disaggregated Infrastructure

          arXiv.org
          
            · December 30, 2025
          
          · ✓ verified

Wei Gao and co-authors present RollArc, a distributed system to speed up agentic RL training by disaggregating workloads onto specialized hardware.

Main announcement: The paper introduces RollArc, built on hardware-affinity workload mapping, fine-grained asynchrony, and statefulness-aware computation to route compute-bound and bandwidth-bound tasks to best-fit GPUs, mitigate resource bubbles at the trajectory level, and offload stateless components (e.g., reward models) to serverless infrastructure. The authors report 1.35–2.05× end-to-end training time reduction compared to monolithic/synchronous baselines.
Background and evaluation details: The system is evaluated by training a hundreds-of-billions-parameter MoE model for the Qoder product on an Alibaba cluster with more than 3,000 GPUs; the paper is 17 pages with 17 figures, submitted to arXiv on 27 Dec 2025 (arXiv:2512.22560). Code available:https://github.com/alibaba/ROLL.