RepetitionCurse: Router Imbalance in MoE LLMs under DoS Stress
arXiv.org
· January 01, 2026
· ✓ verified
The paper by Ruixuan Huang et al. (arXiv:2512.23995, submitted 30 Dec 2025) demonstrates a model-agnostic denial-of-service vulnerability in Mixture-of-Experts (MoE) inference routing using a technique called RepetitionCurse.
- Main announcement: The authors present RepetitionCurse, a low-cost black-box attack that uses simple repetitive token patterns to force routing concentration in MoE models; on Mixtral-8x7B the attack increases end-to-end inference latency by 3.063x, causing computational bottlenecks on some devices and idle resources on others and leading to violations of service-level agreements for time to first token.
- Background and details: The paper documents that out-of-distribution prompts can cause tokens to be routed to the same top-k experts, converting routing imbalance into a denial-of-service attack vector; submission metadata: arXiv:2512.23995 (v1), submission date Tue, 30 Dec 2025, license CC BY 4.0.