Skip to main content
This guide walks you through deploying your first AI model on Simplismart. We’ll use the multimodal Qwen 2.5 VL 7B model as an example, which can process both text and images.

Prerequisites

  • A Simplismart account (Sign up here if you haven’t already)
  • Basic familiarity with the Simplismart platform
  • An API key (Generate one here if needed)

Deployment Process

1

Select a Model

  1. Navigate to the Marketplace section in the left sidebar
  2. Search for “Qwen 2.5 VL 7B” in the search bar
  3. Click on the model card to view details
  4. Click the Deploy button to begin the deployment process
![Model selection from marketplace](ADD IMG HERE)
2

Configure Model Settings

In the deployment configuration screen, you’ll need to complete three main sections:

Model Configuration

  • Base Model: Ensure “Qwen 2.5 VL 7B” is selected in the dropdown
  • Deployment Type: Select “Simplismart Infrastructure” for managed hosting. Alternatively, you can also add your cloud infrastructure here.

Resource Allocation

  • GPU Type: Select “A10G” from the dropdown menu
  • Number of GPUs: Leave at default (1) for this quickstart
![Resource configuration screen](ADD IMG HERE`)
3

Set Up Auto-Scaling

Auto-scaling allows your deployment to adapt to varying workloads automatically.
  1. In the Scaling section, you’ll see default metrics for CPU Utilization already configured
  2. You can add additional metrics by clicking “Add Metric”
  3. Add a new scaling metric, GPU Utilization and set to 80%. That way, the deployment will scale up when the GPU usage reaches 80%
For production deployments, consider adding custom metrics based on your specific workload patterns. See our model deployment docs for more details.
4

Add Metadata

Proper metadata helps organize and identify your deployments:
  1. Deployment Name: Enter a unique, descriptive name (e.g., “qwen-vl-quickstart”)
  2. Deployment Tags: Add the following tag:
    • Key: env
    • Value: quickstart
Tags help filter and organize deployments, especially as your usage grows.
5

Deploy the Model

  1. Review all your configuration settings
  2. Click the Deploy button in the top-right corner
  3. A confirmation dialog will appear showing deployment details
  4. Click Confirm to start the deployment process
The deployment typically takes 30-60 seconds to complete. You’ll be redirected to the deployment details page when finished.
For step-by-step instructions, check out the detailed Model Deployment Guide.
6

Test Your Deployment

Once deployed, you can test your model using the API:
  1. Navigate to the API tab in your deployment details
  2. Select cURL from the language dropdown
  3. Copy the provided code snippet
  4. Replace YOUR-ENDPOINT-HERE with your actual endpoint URL (found in the Details tab)
  5. Add your API key to the Authorization header
curl --location 'YOUR-ENDPOINT-HERE' \
--header 'Authorization: Bearer YOUR_API_KEY' \
--header 'Content-Type: application/json' \
--header 'id: 931e246d-add5-4d33-9e71-3c5edec19425' \
--data '{
    "model": "qwen2-vl",
    "messages": [
        {
            "role": "user",
            "content": [
                {
                    "type": "text",
                    "text": "Describe this image in one sentence."
                },
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
                    }
                }
            ]
        }
    ],
    "max_tokens": 1024,
    "stream": false
}'
Remember to replace YOUR-ENDPOINT-HERE and YOUR_API_KEY with your actual values before running the command.
7

Clean Up Resources

To avoid unnecessary charges, clean up your deployment when you’re done testing:
  1. Navigate to the Deployments section in the left sidebar
  2. Find and select your deployment
  3. Click the three-dots menu in the top-right corner
  4. Select Delete
  5. Enter the deployment name to confirm deletion
  6. Click Delete to complete the process
This step is crucial to free up GPU resources and prevent unexpected charges.

Understanding Your Deployment

Your deployed model provides an OpenAI-compatible endpoint, meaning you can use it with any client library that supports the OpenAI API format. The endpoint supports:
  • Text and image inputs (for multimodal models like Qwen VL)
  • Streaming and non-streaming responses
  • Standard OpenAI API parameters like temperature and max_tokens

Monitoring and Management

After deployment, you can:

Monitor Performance

Track usage metrics, latency, and throughput in the Monitoring tab

Adjust Resources

Scale your deployment up or down based on actual usage patterns

View Logs

Access detailed logs to troubleshoot any issues

Manage API Keys

Create and revoke API keys for secure access

Next Steps

Now that you’ve successfully deployed your first model, consider these next steps:
I