Deploy Your First Model

This guide walks you through deploying your first AI model on Simplismart. We’ll use the multimodal Qwen 2.5 VL 7B model as an example, which can process both text and images.

Prerequisites

A Simplismart account (Sign up here if you haven’t already)
Basic familiarity with the Simplismart platform
An API key (Generate one here if needed)

Deployment Process

Select a Model

Navigate to the Marketplace section in the left sidebar
Search for “Qwen 2.5 VL 7B” in the search bar
Click on the model card to view details
Click the Deploy button to begin the deployment process

![Model selection from marketplace](ADD IMG HERE)

Configure Model Settings

In the deployment configuration screen, you’ll need to complete three main sections:

Model Configuration

Base Model: Ensure “Qwen 2.5 VL 7B” is selected in the dropdown
Deployment Type: Select “Simplismart Infrastructure” for managed hosting. Alternatively, you can also add your cloud infrastructure here.

Resource Allocation

GPU Type: Select “A10G” from the dropdown menu
Number of GPUs: Leave at default (1) for this quickstart

![Resource configuration screen](ADD IMG HERE`)

Set Up Auto-Scaling

Auto-scaling allows your deployment to adapt to varying workloads automatically.

In the Scaling section, you’ll see default metrics for CPU Utilization already configured
You can add additional metrics by clicking “Add Metric”
Add a new scaling metric, GPU Utilization and set to 80%. That way, the deployment will scale up when the GPU usage reaches 80%

For production deployments, consider adding custom metrics based on your specific workload patterns. See our model deployment docs for more details.

Add Metadata

Proper metadata helps organize and identify your deployments:

Deployment Name: Enter a unique, descriptive name (e.g., “qwen-vl-quickstart”)
Deployment Tags: Add the following tag:
- Key: env
- Value: quickstart

Tags help filter and organize deployments, especially as your usage grows.

Deploy the Model

Review all your configuration settings
Click the Deploy button in the top-right corner
A confirmation dialog will appear showing deployment details
Click Confirm to start the deployment process

The deployment typically takes 30-60 seconds to complete. You’ll be redirected to the deployment details page when finished.

For step-by-step instructions, check out the detailed Model Deployment Guide.

Test Your Deployment

Once deployed, you can test your model using the API:

Navigate to the API tab in your deployment details
Select cURL from the language dropdown
Copy the provided code snippet
Replace YOUR-ENDPOINT-HERE with your actual endpoint URL (found in the Details tab)
Add your API key to the Authorization header

curl --location 'YOUR-ENDPOINT-HERE' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--header 'id: 931e246d-add5-4d33-9e71-3c5edec19425' \
--data '{
    "model": "qwen2-vl",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image in one sentence."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
                    }
                }
            ]
        }
    ],
    "max_tokens": 1024,
    "stream": false
}'

Remember to replace YOUR-ENDPOINT-HERE and YOUR_API_KEY with your actual values before running the command.

Clean Up Resources

To avoid unnecessary charges, clean up your deployment when you’re done testing:

Navigate to the Deployments section in the left sidebar
Find and select your deployment
Click the three-dots menu in the top-right corner
Select Delete
Enter the deployment name to confirm deletion
Click Delete to complete the process

This step is crucial to free up GPU resources and prevent unexpected charges.

Understanding Your Deployment

Your deployed model provides an OpenAI-compatible endpoint, meaning you can use it with any client library that supports the OpenAI API format. The endpoint supports:

Text and image inputs (for multimodal models like Qwen VL)
Streaming and non-streaming responses
Standard OpenAI API parameters like temperature and max_tokens

Monitoring and Management

After deployment, you can:

Monitor Performance

Track usage metrics, latency, and throughput in the Monitoring tab

Adjust Resources

Scale your deployment up or down based on actual usage patterns

View Logs

Access detailed logs to troubleshoot any issues

Manage API Keys

Create and revoke API keys for secure access

Next Steps

Now that you’ve successfully deployed your first model, consider these next steps:

Fine-tune Your Model

Customize the model for your specific use case by fine-tuning on your own data

Optimize Performance

Learn techniques to improve latency, throughput, and cost-efficiency using Simplismart Copilot.

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

Deploy Your First Model

Prerequisites

Deployment Process

Model Configuration

Resource Allocation

Understanding Your Deployment

Monitoring and Management

Monitor Performance

Adjust Resources

View Logs

Manage API Keys

Next Steps

Fine-tune Your Model

Optimize Performance

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

​Prerequisites

​Deployment Process

​Model Configuration

​Resource Allocation

​Understanding Your Deployment

​Monitoring and Management

Monitor Performance

Adjust Resources

View Logs

Manage API Keys

​Next Steps

Fine-tune Your Model

Optimize Performance

Prerequisites

Deployment Process

Model Configuration

Resource Allocation

Understanding Your Deployment

Monitoring and Management

Next Steps