> ## Documentation Index > Fetch the complete documentation index at: https://docs.simplismart.ai/llms.txt > Use this file to discover all available pages before exploring further. # Supported Metrics > Reference for all metrics available from the Simplismart metrics endpoint # Supported Metrics The Simplismart metrics endpoint exposes metrics in Prometheus format covering Kubernetes infrastructure health, GPU utilization, inference engine performance, and request lifecycle. For each metric, you only need to provide one of the listed parameters. Deployment namespace and deployment slug refer to the same value. Learn how to find it [here](/images/observability/overview/1-find-slug.png). ## Kubernetes Deployment Metrics Track deployment health and replica status. ### `kube_deployment_spec_replicas` The number of desired replicas for a deployment, as specified in the deployment spec. Type: `gauge` Labels: Deployment namespace (deployment slug). Deployment name. *** ### `kube_deployment_status_replicas` The number of observed replicas for a deployment. Type: `gauge` Labels: Deployment namespace. Deployment name. *** ### `kube_deployment_status_replicas_available` Number of replicas that are available (ready for at least minReadySeconds). Type: `gauge` Labels: Deployment namespace. Deployment name. *** ### `kube_deployment_status_replicas_ready` Number of replicas that have passed their readiness probes. Type: `gauge` Labels: Deployment namespace. Deployment name. For inference servers, readiness means "model is loaded and accepting requests." A container running but not ready consumes resources without serving traffic. *** ### `kube_deployment_status_replicas_unavailable` Number of replicas that are not yet available. Type: `gauge` Labels: Deployment namespace. Deployment name. A non-zero value means capacity is degraded: pods may be crashing, stuck in image pull, or failing health checks. *** ## Kubernetes Pod Metrics Pod-level visibility into lifecycle and health. ### `kube_pod_info` Pod metadata information including node, IP, and phase. Type: `gauge` Labels: Pod namespace. Pod name. Node name where the pod is scheduled. Pod IP address. *** ### `kube_pod_status_phase` Current phase of the pod (Pending, Running, Succeeded, Failed, Unknown). Type: `gauge` Labels: Pod namespace. Pod name. Pod phase (Pending, Running, Succeeded, Failed, Unknown). *** ### `kube_pod_container_status_ready` Whether the container is ready (1) or not (0). Type: `gauge` Labels: Pod namespace. Pod name. Container name. *** ### `kube_pod_container_status_running` Whether the container is running (1) or not (0). Type: `gauge` Labels: Pod namespace. Pod name. Container name. *** ### `kube_pod_container_status_restarts_total` Cumulative count of container restarts. Type: `counter` Labels: Pod namespace. Pod name. Container name. Frequent restarts indicate OOM kills, crash loops, or probe failures. For GPU inference pods, restarts are expensive because model loading can take minutes. *** ### `kube_pod_container_status_waiting_reason` The reason the container is in waiting state (e.g., ContainerCreating, CrashLoopBackOff, ErrImagePull). Type: `gauge` Labels: Pod namespace. Pod name. Container name. Reason for waiting state. *** ## Container Resource Metrics Track actual resource consumption. ### `container_cpu_usage_seconds_total` Cumulative CPU time consumed by the container, in core-seconds. Type: `counter` Labels: Pod namespace. Pod name. Container name. *** ### `container_memory_working_set_bytes` Current working set memory of the container in bytes. This is what the OOM killer uses for eviction decisions. Type: `gauge` Labels: Pod namespace. Pod name. Container name. When this approaches the container's memory limit, OOM kills become imminent. *** ### `kube_pod_container_resource_requests` Resource requests configured for the container. Type: `gauge` Labels: Pod namespace. Pod name. Container name. Resource type (cpu, memory, nvidia\_com\_gpu). Unit of measurement (core, byte, etc.). *** ### `kube_pod_container_resource_limits` Resource limits configured for the container. Type: `gauge` Labels: Pod namespace. Pod name. Container name. Resource type (cpu, memory, nvidia\_com\_gpu). Unit of measurement (core, byte, etc.). *** ## GPU Metrics GPU metrics for Simplismart's GPU-accelerated inference workloads. ### `DCGM_FI_DEV_GPU_UTIL` GPU utilization as a percentage (0–100). Measures what fraction of time the GPU's streaming multiprocessors are active. Type: `gauge` Labels: GPU index. GPU UUID. Device identifier. GPU model name (e.g., "NVIDIA H100"). Host name. Deployment namespace. Pod name. **Example:** ```promql theme={null} DCGM_FI_DEV_GPU_UTIL{Hostname="simplismart-dell-001",exported_namespace="your-namespace",pod="nvidia-dcgm-exporter-xyz"} ``` Low utilization means inefficient batching; high sustained utilization (>90%) indicates the GPU is at capacity. *** ### `DCGM_FI_DEV_FB_USED` Frame buffer (GPU VRAM) memory used, in MiB. Type: `gauge` Labels: GPU index. Host name. Deployment namespace. Pod name. **Example:** ```promql theme={null} DCGM_FI_DEV_FB_USED{Hostname="simplismart-dell-001",exported_namespace="your-namespace",pod="nvidia-dcgm-exporter-xyz"} ``` GPU memory is the primary constraint for model serving. When VRAM is exhausted, inference requests fail with OOM errors. *** ### `DCGM_FI_DEV_FB_FREE` Frame buffer (GPU VRAM) memory free, in MiB. Type: `gauge` Labels: Host name. Deployment namespace. Pod name. **Example:** ```promql theme={null} DCGM_FI_DEV_FB_FREE{Hostname="simplismart-dell-001",exported_namespace="your-namespace",pod="nvidia-dcgm-exporter-xyz"} ``` *** ## Ingress Metrics External traffic entering the Simplismart platform. ### `nginx_ingress_controller_requests` Total number of requests handled by the NGINX ingress controller. Type: `counter` Labels: Ingress name. Namespace. Service name. HTTP status code. HTTP method (GET, POST, etc.). Request path. **Available metrics:** * `nginx_ingress_controller_requests` - Total request count * `nginx_ingress_controller_request_duration_seconds_bucket` - Request duration histogram buckets * `nginx_ingress_controller_request_duration_seconds_sum` - Total request duration sum * `nginx_ingress_controller_request_duration_seconds_count` - Total request count for duration *** ### `nginx_ingress_controller_request_duration_seconds` End-to-end request duration as observed by the ingress controller. Type: `histogram` (exposes `_bucket`, `_sum`, and `_count` time series) Labels: Ingress name. Namespace. Service name. HTTP status code. HTTP method (GET, POST, etc.). Request path. **Example query (p99):** ```promql theme={null} histogram_quantile(0.99, rate(nginx_ingress_controller_request_duration_seconds_bucket[5m])) ``` ***