1. Navigate to My Models and click on Add a Model button on the top-right. image.png
  2. Enter Model Details
    • Provide a Model Name.
    • Select Model Source: Choose between Hugging Face or AWS/GCP bucket.
    • Enter the Model Path.
    • Select the Linked Cloud Credentials.
      Upload your trained model to AWS S3 or GCP GCS, share the access credentials, and we’ll take care of compiling and preparing it for deployment. Models built to your specifications are seamlessly integrated into our platform.
    image.png
  3. Configure Model Class
    • Under Model Class, choose Custom Pipeline.
    image.png
  4. Select Infrastructure
    • Choose Simplismart Infrastructure for optimized deployment.
    • Select a GPU type based on your model’s size and compute requirements.
    • The Machine Type can be left as the default Simplismart configuration.
image.png
  • Edit Pipeline Configuration (Optional)
    •  After selecting the infrastructure, you can configure the pipeline further based on your model’s runtime needs.
    • Use the Pipeline Config Editor to customize key deployment parameters.
    • Available fields include:
      • workers_per_device:
        Number of parallel workers assigned. (greater value, greater inference speed)
        Type: Int
        Default value : 1
      • device:
        Specifies the hardware (CPU, GPU) used for model inference.
        Type: string
        Default value : cpu

        Possible Values:
        • "cpu" — For CPU hardware
        • "cuda" — For GPU hardware
      • endpoint:
        User specified URl for model to receive and process requests.
        Type: String
        Default value : /predict
      • type:
        Model type used by the compilation engine to create configuration.
        Type: string
        Default: Needs to be specified
        Possible values :
        • custom — for custom serving feature
        • whisper — for Whisper serving
        • llm — for LLM serving
        • sd — for Stable Diffusion serving
        Recommended setting: use "custom" for serving your own custom models
    • These fields can be entered under the Extra Params section in key-value format.
Example Configuration:
  • model.py:
    class Model:
      def __init__(self):
        self.model_ = None
      def load(self):
        self.model = "Model Initialization"
      def preprocess(self, request): # Can be any pydantic base model, dict, string
        # Do some preprocessing
        # ....
        return request
      def predict(self, request):
        # Do prediction
        # ....
        output = self.model.predict()
        return output
      def postprocess(self, request):
        # Do postprocesing here
        # ....
        return request
    
  • config.yaml:
    python_version: "3.10"
    environment_variables: {}
    requirements:
      - accelerate==0.20.3
      - bitsandbytes==0.39.1
      - peft==0.3.0
      - protobuf==4.23.3
      - sentencepiece==0.1.99
      - torch==2.0.1
      - transformers==4.30.2
    system_packages:
      - wget
      - curl
    custom_setup_script: "script.sh"
    
  • Pipeline Config:
{
    "type": "custom",
    "extra_params": {
        "workers_per_device": 2,
        "device": "cuda",
		"endpoint": "/predict"
    }
}
  • Add the Model
    • Once all configurations are complete, click Add Model to start the model compilation.