Cloud and AI Infrastructure Cost Optimization: Strategies and Case Studies

          arXiv.org
          
            · January 28, 2026
          
          · ✓ verified

Saurabh Deochake has published a revised arXiv paper (v2) expanding its scope to AI/ML infrastructure and GPU cost optimization, updated 27 Jan 2026.

Main announcement: The author released Version 2 of the paper on 27 Jan 2026 (arXiv:2307.12479v2), significantly expanded to include AI/ML infrastructure and GPU cost optimization, updated with 2025 industry data and new case studies on LLM inference costs; title changed to reflect broader scope.
Background and key findings: The paper is a comprehensive review covering traditional cloud pricing models, resource allocation, model optimization techniques (quantization, GPU instance selection, inference optimization), and reports that GPU compute represents 40-60% of technical budgets for AI-focused organizations, LLM inference costs decreased ~10x annually since 2021, and organizations can achieve 50-90% cost savings; includes case studies from Amazon Prime Video, Pinterest, Cloudflare, and Netflix, and links to PDF/HTML/TeX and DOI.