NVIDIA unveils GPU fleet monitoring service for data centers
NVIDIA
· December 10, 2025
· ✓ verified
NVIDIA has announced an opt-in, customer-installed software service to visualize and monitor fleets of NVIDIA GPUs for data center and cloud operators.
- Service features include a read-only, open-source client agent that streams node-level GPU telemetry (usage, power, temperature, memory bandwidth, interconnect health, errors) to an NVIDIA NGC-hosted portal, enabling dashboards by global fleet or compute zones, early detection of hotspots and anomalies, and generation of detailed GPU fleet reports.
- NVIDIA emphasizes no hardware tracking, kill switches or backdoors in its GPUs; the agent is customer-managed, cannot modify GPU configurations, and is intended as a transparent reference implementation that enterprises can integrate into their own monitoring solutions for AI data centers and critical compute clusters.