Inference

This guide will walk you through making your first API call to Simplismart’s pre-deployed models. Before starting, ensure you have signed up for a Simplismart account.

Prerequisites

A Simplismart account
Basic Python knowledge
Python 3.8+ installed on your system

Step-by-Step Guide

Access the Playground

Log in to your Simplismart account
From the left sidebar, click on Playground
In the model dropdown, select Gemma 3 1B. (For example purposes, Gemma 3 1B is considered here. Any other LLM can be used as well.)
You’ll see an interactive chat interface where you can test the model directly

Get API Details

In the Playground, click on Get API details in the left sidebar
You’ll be redirected to a page with ready-to-use code snippets
Note that both Python (OpenAI client) and cURL examples are provided
Copy the provided code snippet or use the given below

The API is compatible with any OpenAI-compliant client library, not just the official Python SDK.

Create Your Python Script

Create a new file named inference.py with the following code:

# inference.py
from openai import OpenAI

# Replace with your actual API key from Settings > API Keys
simplismart_api_key = "YOUR_API_KEY"

# Replace with your endpoint for the Gemma-3-1B model from the Model details page 
endpoint_url = "YOUR_MODEL_ENDPOINT"

# Request identifier, replace it with your actual ID from Playground > API details > Get API details > Select Python Code > fetch id
id = "YOUR_REQUST_ID"

try:
    # Initialize the OpenAI client with Simplismart endpoint
    client = OpenAI(
        api_key=simplismart_api_key,
        base_url=endpoint_url,        
        default_headers={"id": request_id}
    )
    
    # Define model and prompt
    MODEL_NAME = "gemma-3"
    PROMPT = "What is quantization in GenAI models?"    

    print(f"User: {PROMPT}\n")
    print("AI Assistant: ", end="", flush=True)

    # Create a streaming completion request
    stream = client.chat.completions.create(
        model=MODEL_NAME,
        messages=[
            {
                "role": "system",
                "content": "You are a helpful AI assistant."
            },
            {
                "role": "user", 
                "content": PROMPT
            }
        ],
        max_tokens=512,  # Response length limit
        stream=True,     # Enable streaming for faster first token
    )

    # Process the streamed response
    for chunk in stream:        
        text_delta = chunk.choices[0].delta.content
        if text_delta:            
            print(text_delta, end="", flush=True)
    print()  # Add newline after response

except Exception as e:    
    print(f"An unexpected error occurred: {e}")

Remember to replace "YOUR_API_KEY" and YOUR_MODEL_ENDPOINT with the actual API key and model endpoint you generated in the previous steps.

Generate an API Key

Navigate to Settings > API Keys from the main sidebar
Click Generate New Key
Provide a descriptive name for your key (must be unique)
Set an appropriate expiration date
Copy the generated API key (you won’t be able to see it again)

Keep your API key secure and never expose it in client-side code or public repositories.

Run Your Script

Install the OpenAI Python client if you haven’t already:

pip install openai

Run your script:

python inference.py

You should see the model’s response to your query streaming in your terminal!

Congratulations! 🎉 You’ve successfully made your first API call to a Simplismart model.

Understanding Shared vs. Dedicated Endpoints

In this quickstart, you used a shared endpoint - a pre-deployed model that’s available to all Simplismart users. While convenient for testing and development, shared endpoints have some limitations:

Shared Endpoints

Quick to get started and no deployment required
Easy switching between different models
Pay-as-you-go pricing
Limited customization options

Dedicated Endpoints

Private to your organisation and optimised for your needs
Option to choose from a wide range of customisation
Deploy and scale your proprietary model hassle-free
Better control over latency, throughput, and costs

Next Steps

Ready to take your AI implementation further? Try these next steps:

Deploy your own dedicated model for better performance and customization
Fine-tune a model on your own data for improved accuracy
Explore the API reference for advanced integration options

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

Prerequisites

Step-by-Step Guide

Understanding Shared vs. Dedicated Endpoints

Shared Endpoints

Dedicated Endpoints

Next Steps

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Benchmarking

Training

Settings

References

​Prerequisites

​Step-by-Step Guide

​Understanding Shared vs. Dedicated Endpoints

Shared Endpoints

Dedicated Endpoints

​Next Steps

Prerequisites

Step-by-Step Guide

Understanding Shared vs. Dedicated Endpoints

Next Steps