Skip to main content

Initiate New Deployment

From the main menu, select the Deployments tab
  • Click on the Create button to start a new deployment
    • Enter Deployment Name: Provide a unique name for your deployment.
    • Select Model: Choose the model you want to deploy from the dropdown. You can also add a new model in the Models section.
    • Select Cloud: Select Simplismart Cloud to deploy as a Dedicated Endpoint. Choose BYOC to deploy the model on your cluster.

Dedicated Deployment

For dedicated deployments please ensure you’ve enough quota of the desirabled resources.
1

Choose Accelerator

Choose the desired accelerators type, based on your model size and SLA requirements.
2

Add Scaling Metrics

  • Specify the scaling metrics that will be used to auto-scale your deployment.
  • Set the threshold values for each metric to trigger scaling actions.
  • You can choose multiple scaling metrics based on your load patterns.
3

Deploy

  • Click on the Deploy Model button to initiate the deployment process.
  • Check the right part of the screen to see the creation status of your deployment.
  • Monitor the deployment status to know when the model is ready for usage.
  • The status will show deployed once done. Your model is now ready for use.

BYOC Deployment

For BYOC deployments it’s mandatory to have a linked cloud account and an active cluster with the required resources.
1

Cloud Details

Select the cluster and the required node group based on the model.Cluster: Select the target cluster.Node Group: Select the node group based on the GPU type and compute specs required by your model (e.g., A100, H100, T4).This ensures compatibility and optimal resource allocation during deployment.
2

Resource Details

Choose the appropriate CPU and memory resources based on the selected node group.
Configure Resource RequirementsSet resource limits for your deployment:CPU Request & Limit
  • CPU Request: Minimum guaranteed CPU for the container.
  • CPU Limit: Maximum CPU the container can use. Throttled if exceeded.
Memory Request & Limit
  • Memory Request: Minimum guaranteed memory.
  • Memory Limit: Maximum memory allowed. Exceeding it results in termination (OOM error).
3

Add Scaling Metrics

  • Specify the scaling metrics that will be used to auto-scale your deployment.
  • Set the threshold values for each metric to trigger scaling actions.
  • You can choose multiple scaling metrics based on your load patterns.
4

Deploy

  • Click on the Deploy Model button to initiate the deployment process.
  • Check the right part of the screen to see the creation status of your deployment.
  • Monitor the deployment status to know when the model is ready for usage.
  • The status will show deployed once done. Your model is now ready for use.