Skip to main content

Supported Metrics

The Simplismart metrics endpoint exposes metrics in Prometheus format covering Kubernetes infrastructure health, GPU utilization, inference engine performance, and request lifecycle.
For each metric, you only need to provide one of the listed parameters.
Deployment namespace and deployment slug refer to the same value. Learn how to find it here.

Kubernetes Deployment Metrics

Track deployment health and replica status.

kube_deployment_spec_replicas

The number of desired replicas for a deployment, as specified in the deployment spec. Type: gauge Labels:
namespace
label
Deployment namespace (deployment slug).
deployment
label
Deployment name.

kube_deployment_status_replicas

The number of observed replicas for a deployment. Type: gauge Labels:
namespace
label
Deployment namespace.
deployment
label
Deployment name.

kube_deployment_status_replicas_available

Number of replicas that are available (ready for at least minReadySeconds). Type: gauge Labels:
namespace
label
Deployment namespace.
deployment
label
Deployment name.

kube_deployment_status_replicas_ready

Number of replicas that have passed their readiness probes. Type: gauge Labels:
namespace
label
Deployment namespace.
deployment
label
Deployment name.
For inference servers, readiness means “model is loaded and accepting requests.” A container running but not ready consumes resources without serving traffic.

kube_deployment_status_replicas_unavailable

Number of replicas that are not yet available. Type: gauge Labels:
namespace
label
Deployment namespace.
deployment
label
Deployment name.
A non-zero value means capacity is degraded — pods may be crashing, stuck in image pull, or failing health checks.

Kubernetes Pod Metrics

Pod-level visibility into lifecycle and health.

kube_pod_info

Pod metadata information including node, IP, and phase. Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
node
label
Node name where the pod is scheduled.
pod_ip
label
Pod IP address.

kube_pod_status_phase

Current phase of the pod (Pending, Running, Succeeded, Failed, Unknown). Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
phase
label
Pod phase (Pending, Running, Succeeded, Failed, Unknown).

kube_pod_container_status_ready

Whether the container is ready (1) or not (0). Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.

kube_pod_container_status_running

Whether the container is running (1) or not (0). Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.

kube_pod_container_status_restarts_total

Cumulative count of container restarts. Type: counter Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.
Frequent restarts indicate OOM kills, crash loops, or probe failures. For GPU inference pods, restarts are expensive because model loading can take minutes.

kube_pod_container_status_waiting_reason

The reason the container is in waiting state (e.g., ContainerCreating, CrashLoopBackOff, ErrImagePull). Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.
reason
label
Reason for waiting state.

Container Resource Metrics

Track actual resource consumption.

container_cpu_usage_seconds_total

Cumulative CPU time consumed by the container, in core-seconds. Type: counter Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.

container_memory_working_set_bytes

Current working set memory of the container in bytes. This is what the OOM killer uses for eviction decisions. Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.
When this approaches the container’s memory limit, OOM kills become imminent.

kube_pod_container_resource_requests

Resource requests configured for the container. Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.
resource
label
Resource type (cpu, memory, nvidia_com_gpu).
unit
label
Unit of measurement (core, byte, etc.).

kube_pod_container_resource_limits

Resource limits configured for the container. Type: gauge Labels:
namespace
label
Pod namespace.
pod
label
Pod name.
container
label
Container name.
resource
label
Resource type (cpu, memory, nvidia_com_gpu).
unit
label
Unit of measurement (core, byte, etc.).

GPU Metrics

GPU metrics for Simplismart’s GPU-accelerated inference workloads.

DCGM_FI_DEV_GPU_UTIL

GPU utilization as a percentage (0–100). Measures what fraction of time the GPU’s streaming multiprocessors are active. Type: gauge Labels:
gpu
label
GPU index.
UUID
label
GPU UUID.
device
label
Device identifier.
modelName
label
GPU model name (e.g., “NVIDIA H100”).
Hostname
label
Host name.
exported_namespace
label
Deployment namespace.
pod
label
Pod name.
Example:
DCGM_FI_DEV_GPU_UTIL{Hostname="simplismart-dell-001",exported_namespace="your-namespace",pod="nvidia-dcgm-exporter-xyz"}
Low utilization means inefficient batching; high sustained utilization (>90%) indicates the GPU is at capacity.

DCGM_FI_DEV_FB_USED

Frame buffer (GPU VRAM) memory used, in MiB. Type: gauge Labels:
gpu
label
GPU index.
Hostname
label
Host name.
exported_namespace
label
Deployment namespace.
pod
label
Pod name.
Example:
DCGM_FI_DEV_FB_USED{Hostname="simplismart-dell-001",exported_namespace="your-namespace",pod="nvidia-dcgm-exporter-xyz"}
GPU memory is the primary constraint for model serving. When VRAM is exhausted, inference requests fail with OOM errors.

DCGM_FI_DEV_FB_FREE

Frame buffer (GPU VRAM) memory free, in MiB. Type: gauge Labels:
Hostname
label
Host name.
exported_namespace
label
Deployment namespace.
pod
label
Pod name.
Example:
DCGM_FI_DEV_FB_FREE{Hostname="simplismart-dell-001",exported_namespace="your-namespace",pod="nvidia-dcgm-exporter-xyz"}

Ingress Metrics

External traffic entering the Simplismart platform.

nginx_ingress_controller_requests

Total number of requests handled by the NGINX ingress controller. Type: counter Labels:
ingress
label
Ingress name.
exported_namespace
label
Namespace.
service
label
Service name.
status
label
HTTP status code.
method
label
HTTP method (GET, POST, etc.).
path
label
Request path.
Available metrics:
  • nginx_ingress_controller_requests - Total request count
  • nginx_ingress_controller_request_duration_seconds_bucket - Request duration histogram buckets
  • nginx_ingress_controller_request_duration_seconds_sum - Total request duration sum
  • nginx_ingress_controller_request_duration_seconds_count - Total request count for duration

nginx_ingress_controller_request_duration_seconds

End-to-end request duration as observed by the ingress controller. Type: histogram (exposes _bucket, _sum, and _count time series) Labels:
ingress
label
Ingress name.
exported_namespace
label
Namespace.
service
label
Service name.
status
label
HTTP status code.
method
label
HTTP method (GET, POST, etc.).
path
label
Request path.
Example query (p99):
histogram_quantile(0.99, rate(nginx_ingress_controller_request_duration_seconds_bucket[5m]))