Prometheus-only observability for GPU clusters
TensorPool Recon
Ask what matters in a Kubernetes GPU cluster and get a briefing backed by PromQL, live chart specs, and validated evidence.
Research GPU Cluster
live evidenceWhat should I care about in this cluster?
WarningIdle GPUs while requests are pending4 GPUs below 5%
ContextPod attribution is node-level onlyDCGM labels
EvidencePromQL and chart artifacts persistedvalidated
GPU utilization by node
92%
87%
4%
3%
1%
1%