Steps for Using the Deployed Model and Monitoring it’s Performance

Once the model is successfully deployed, you can follow these steps to begin inference:

  • Go to the API tab of your model

  • Find the Endpoint URL and the pre-generated inference script

  • Copy the script, replace placeholder values, and execute it to call the model

How to access your API Key?

Go to Account Settings and select API Key


The Monitor tab provides an overview of your deployment’s performance.

  1. Monitor Real-Time Status:
  • Pod Info: Status and count of active pods

  • Throughput & Latency: Requests per second and processing time

  • Success & Failure Rates: Percentage of successful and failed inferences

  • Resource Monitoring: Various system level metrics can be tracked along with system load information; such as no. of nodes, no. of pods & CPU/ GPU usage

2. Resource Monitoring: Various system level metrics can be tracked along with system load information; such as no. of nodes, no. of pods & CPU/ GPU usage