Adding a Custom Model

Navigate to My Models and click on Add a Model button on the top-right.
Enter Model Details
- Provide a Model Name.
- Select Model Source: Choose between Hugging Face or AWS/GCP bucket.
- Enter the Model Path.
- Select the Linked Cloud Credentials.
  Upload your trained model to AWS S3 or GCP GCS, share the access credentials, and we’ll take care of compiling and preparing it for deployment. Models built to your specifications are seamlessly integrated into our platform.
Configure Model Class
- Under Model Class, choose Custom Pipeline.
Select Infrastructure
- Choose Simplismart Infrastructure for optimized deployment.
- Select a GPU type based on your model’s size and compute requirements.
- The Machine Type can be left as the default Simplismart configuration.

Edit Pipeline Configuration (Optional)
- After selecting the infrastructure, you can configure the pipeline further based on your model’s runtime needs.
- Use the Pipeline Config Editor to customize key deployment parameters.
- Available fields include:
  - workers_per_device:
    Number of parallel workers assigned. (greater value, greater inference speed)
    Type: Int Default value : 1
  - device:
    Specifies the hardware (CPU, GPU) used for model inference.
    Type: string Default value : cpu
    
    Possible Values:
    - "cpu" — For CPU hardware
    - "cuda" — For GPU hardware
  - endpoint:
    User specified URl for model to receive and process requests.
    Type: String Default value : /predict
  - type:
    Model type used by the compilation engine to create configuration.
    Type: string Default: Needs to be specified
    Possible values :
    - custom — for custom serving feature
    - whisper — for Whisper serving
    - llm — for LLM serving
    - sd — for Stable Diffusion serving
    Recommended setting: use "custom" for serving your own custom models
- These fields can be entered under the Extra Params section in key-value format.

Example Configuration:

model.py:

class Model:
  def __init__(self):
    self.model_ = None
  def load(self):
    self.model = "Model Initialization"
  def preprocess(self, request): # Can be any pydantic base model, dict, string
    # Do some preprocessing
    # ....
    return request
  def predict(self, request):
    # Do prediction
    # ....
    output = self.model.predict()
    return output
  def postprocess(self, request):
    # Do postprocesing here
    # ....
    return request

config.yaml:

python_version: "3.10"
environment_variables: {}
requirements:
  - accelerate==0.20.3
  - bitsandbytes==0.39.1
  - peft==0.3.0
  - protobuf==4.23.3
  - sentencepiece==0.1.99
  - torch==2.0.1
  - transformers==4.30.2
system_packages:
  - wget
  - curl
custom_setup_script: "script.sh"

Pipeline Config:

{
    "type": "custom",
    "extra_params": {
        "workers_per_device": 2,
        "device": "cuda",
		"endpoint": "/predict"
    }
}

Add the Model
- Once all configurations are complete, click Add Model to start the model compilation.

Get Started

Types of Inference

Playground

Model Compilation

Deployment

Training

Settings

References