Documentation home page
Search...
⌘K
Ask AI
Support
Sign Up
Sign Up
Search...
Navigation
Types of Inference
Private Endpoint
Documentation
API Reference
Guides
FAQ
Configuration References
Blog
Get Started
Overview
Quickstart
Types of Inference
Shared Endpoint
Private Endpoint
BYOC
Playground
Large Language Models
Transcription Models
Image Generation Models
Model Compilation
Optimise a Model
Adding a Custom Model
Deployment
Creating a Deployment
Deploying a Custom Model
Deploy on an Imported Cluster
Inference & Monitoring
Benchmarking
Deploying NIM
Training
Introduction
LLM/VLM (New)
LLM/VLM (Legacy)
Flux
Deploy Fine-Tuned model
Settings
General Settings
Your Organisation
API Keys
Usage
Billing
References
Terminology Guide
On this page
Benefits of using a private endpoint
Deploying your model on a Private Endpoint
Model Optimisation
Model Deployment
Inferencing
Types of Inference
Private Endpoint
Private endpoints provide dedicated infrastructure for your model deployments, ensuring better performance and reliability.
Benefits of using a private endpoint
Dedicated resources
: No sharing of compute resources with other users.
Enhanced performance
: Improved response times and throughput.
Higher reliability
: Reduced risk of downtime and performance degradation.
Deploying your model on a Private Endpoint
To deploy your model on a private endpoint, follow these outlined processes. Each step includes a link for detailed instructions, ensuring a smooth launch and use of your deployed model.
Model Optimisation
You can either choose an available model from our
model marketplace
, or add your own.
Optimise your model for deployment by visiting the
Models
page and using our optimisation tools.
Model Deployment
You can deploy your optimised model on a private endpoint by selecting your cloud provider as simplismart, click here for detailed
deployment
steps.
Inferencing
You can invoke your deployed models from the
API tab
of the model deployment.
Want the deployment to run in your own cluster? Here’s
how
.
Shared Endpoint
Overview
Assistant
Responses are generated using AI and may contain mistakes.