Prerequisites
- A Simplismart account (Sign up here if you haven’t already)
- Basic familiarity with the Simplismart platform
- An API key (Generate one here if needed)
Deployment Process
1
Select a Model
- Navigate to the Marketplace section in the left sidebar
- Search for “Qwen 2.5 VL 7B” in the search bar
- Click on the model card to view details
- Click the Deploy button to begin the deployment process
2
Configure Model Settings
In the deployment configuration screen, you’ll need to complete three main sections:
Model Configuration
- Base Model: Ensure “Qwen 2.5 VL 7B” is selected in the dropdown
- Deployment Type: Select “Simplismart Infrastructure” for managed hosting. Alternatively, you can also add your cloud infrastructure here.
Resource Allocation
- GPU Type: Select “A10G” from the dropdown menu
- Number of GPUs: Leave at default (1) for this quickstart
3
Set Up Auto-Scaling
Auto-scaling allows your deployment to adapt to varying workloads automatically.
- In the Scaling section, you’ll see default metrics for
CPU Utilization
already configured - You can add additional metrics by clicking “Add Metric”
- Add a new scaling metric,
GPU Utilization
and set to 80%. That way, the deployment will scale up when the GPU usage reaches 80%
For production deployments, consider adding custom metrics based on your specific workload patterns. See our model deployment docs for more details.
4
Add Metadata
Proper metadata helps organize and identify your deployments:
- Deployment Name: Enter a unique, descriptive name (e.g., “qwen-vl-quickstart”)
- Deployment Tags: Add the following tag:
- Key:
env
- Value:
quickstart
- Key:
5
Deploy the Model
- Review all your configuration settings
- Click the Deploy button in the top-right corner
- A confirmation dialog will appear showing deployment details
- Click Confirm to start the deployment process
For step-by-step instructions, check out the detailed Model Deployment Guide.
6
Test Your Deployment
Once deployed, you can test your model using the API:
- Navigate to the API tab in your deployment details
- Select cURL from the language dropdown
- Copy the provided code snippet
- Replace
YOUR-ENDPOINT-HERE
with your actual endpoint URL (found in the Details tab) - Add your API key to the Authorization header
Remember to replace
YOUR-ENDPOINT-HERE
and YOUR_API_KEY
with your actual values before running the command.7
Clean Up Resources
To avoid unnecessary charges, clean up your deployment when you’re done testing:
- Navigate to the Deployments section in the left sidebar
- Find and select your deployment
- Click the three-dots menu in the top-right corner
- Select Delete
- Enter the deployment name to confirm deletion
- Click Delete to complete the process
This step is crucial to free up GPU resources and prevent unexpected charges.
Understanding Your Deployment
Your deployed model provides an OpenAI-compatible endpoint, meaning you can use it with any client library that supports the OpenAI API format. The endpoint supports:- Text and image inputs (for multimodal models like Qwen VL)
- Streaming and non-streaming responses
- Standard OpenAI API parameters like temperature and max_tokens
Monitoring and Management
After deployment, you can:Monitor Performance
Track usage metrics, latency, and throughput in the Monitoring tab
Adjust Resources
Scale your deployment up or down based on actual usage patterns
View Logs
Access detailed logs to troubleshoot any issues
Manage API Keys
Create and revoke API keys for secure access