Manage deployments using the client.deployments attribute or convenience methods.
create_deployment
Creates a deployment for a model repo.
Use env for model repo UUID and organization ID (e.g. ORG_ID); do not hardcode secrets.
import os
from dotenv import load_dotenv
load_dotenv()
from simplismart import DeploymentCreate, Simplismart
client = Simplismart()
deployment = client.create_deployment(
DeploymentCreate(
model_repo=os.getenv("MODEL_REPO_ID", "model-repo-uuid"),
org=os.getenv("ORG_ID"),
gpu_id="nvidia-h100",
name="vision-private-deploy",
min_pod_replicas=1,
max_pod_replicas=2,
autoscale_config={"targets": [{"metric": "gpu", "target": 80}]},
env_variables={"KEY": "value"},
healthcheck={"path": "/", "port": 8000},
ports={"http": {"port": 8000}},
metrics_path=["/v1/chat/completions"],
fast_scaleup=True,
deployment_tag="v1.0",
)
)
DeploymentCreate
| Parameter | Type | Description | Required |
|---|
model_repo | str | Model repository UUID | Yes |
org | str | Organization UUID (org_id) | Yes |
gpu_id | str | GPU type identifier. Examples: nvidia-h100, nvidia-a10, nvidia-l4 | Yes |
name | str | Deployment name (3-60 chars) | Yes |
min_pod_replicas | int | Minimum pod replicas (≥ 1) | Yes |
max_pod_replicas | int | Maximum pod replicas (≥ 1) | Yes |
autoscale_config | AutoscaleConfig | Autoscaling configuration | Yes |
env_variables | dict | None | Environment variables | No |
deployment_custom_configuration | dict | None | Custom deployment config | No |
healthcheck | dict | None | Health check configuration | No |
ports | dict | None | Port mappings | No |
metrics_path | list | None | Metrics paths | No |
persistent_volume_claims | dict | list | None | PVC configurations | No |
fast_scaleup | bool | None | Enable fast scale up | No |
deployment_tag | str | None | Deployment tag label | No |
AutoscaleConfig
autoscale_config = {
"targets": [
{
"metric": "gpu", # Required
"target": 80, # Required (number)
"percentile": 95 # Optional, only for latency metric
}
]
}
| Metric Option | Description |
|---|
concurrency | Number of concurrent requests |
cpu | CPU utilization percentage |
gpu | GPU utilization percentage |
gram | GPU memory utilization |
latency | Request latency (supports percentiles 50, 75, 90, 95) |
ram | RAM utilization |
throughput | Requests per second |
The percentile field is only supported when metric is set to latency.
list_deployments
Lists deployments with optional filtering.
import os
from dotenv import load_dotenv
load_dotenv()
from simplismart import Simplismart
client = Simplismart()
deployments = client.list_deployments(
model_repo_id=os.getenv("MODEL_REPO_ID"), # Optional
status="DEPLOYED",
offset=0,
count=20,
)
print(deployments)
Expected output — list of deployment summary objects:
[
{
"deployment_id": "deployment-uuid",
"deployment_name": "speechbrain-v3",
"model_repo_id": "model-repo-uuid",
"model_repo_name": "speechbrain",
"model_type": "unknown",
"accelerator_type": ["nvidia-l40s"],
"accelerator_count": 1,
"status": "DEPLOYED"
}
]
Deployment Status Options
| Value | Description |
|---|
DEPLOYED | Deployment is running |
PENDING | Deployment is being created |
FAILED | Deployment failed |
STOPPED | Deployment is stopped |
DELETED | Deployment has been deleted |
list_model_deployments
Lists all model deployments for an organization.
deployments = client.list_model_deployments(org_id="org-uuid")
get_model_deployment
Gets deployment details by ID. Set DEPLOYMENT_ID in env or use an id from list_deployments.
deployment = client.get_model_deployment(deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"))
print(deployment)
Expected output — deployment object with uuid, name, status, model_repo, org, autoscale_config, healthcheck, ports, min_pod_replicas, max_pod_replicas, etc.
get_deployment
Get deployment details by ID.
deployment = client.get_deployment(deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"))
update_deployment
Updates deployment configuration.
updated = client.update_deployment(
deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"),
payload={
"min_pod_replicas": 1,
"max_pod_replicas": 2,
"autoscale_config": {"targets": [{"metric": "gpu", "target": 80}]},
},
)
stop_deployment
Stops a running deployment.
result = client.stop_deployment(deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"))
start_deployment
Starts a stopped deployment.
result = client.start_deployment(deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"))
restart_deployment
Restarts a deployment.
result = client.restart_deployment(
deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"),
)
fetch_deployment_health
Gets deployment health status.
health = client.fetch_deployment_health(deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"))
print(health)
Expected output
{
"data": "Healthy",
"messages": [
{
"message": "Ready to use, the model is running and available for inference.",
"severity": "info"
}
],
"pods": { "ready": 1, "not_ready": 0 }
}
update_deployment_autoscaling
Updates deployment autoscaling configuration.
result = client.update_deployment_autoscaling(
deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"),
min_replicas=1,
max_replicas=3,
)
delete_deployment
Deletes a deployment.
result = client.delete_deployment(
deployment_id=os.getenv("DEPLOYMENT_ID", "deployment-uuid"),
)
Error Handling
The SDK raises SimplismartError for all API errors.
from simplismart import Simplismart, SimplismartError
client = Simplismart()
try:
deployment = client.get_deployment(deployment_id="00000000-0000-0000-0000-000000000000")
except SimplismartError as e:
print("Status:", e.status_code)
print("Message:", e)
print("Payload:", e.payload)
Expected output (for invalid or missing deployment):
Caught SimplismartError:
status_code: 404
message: No ModelDeployment matches the given query. (status=404)
payload: {'detail': 'No ModelDeployment matches the given query.'}
SimplismartError Attributes
| Attribute | Type | Description |
|---|
status_code | int | HTTP status code |
payload | dict | Full error response payload |
message | str | Error message from backend |
BYOC Deployment
Create a BYOC deployment with a payload (cluster, nodegroup, etc.). See Bring your own compute and Deploy on imported cluster.
import json
from simplismart import Simplismart
client = Simplismart()
with open("byoc-create.json") as f:
payload = json.load(f)
deployment = client.create_byoc_deployment(payload=payload)