Skip to main content
After training the model, you can deploy the LoRA Model with Simplismart To deploy your fine-tuned model, follow the detailed steps outlined below, which guide you through the process of optimizing, configuring, and completing the deployment to make your model ready for use.

Merge with Base Model

Click on Compile to merge the LoRA adapter back into the base model, creating a fine-tuned model. This step will take you to the Add Model page. Follow the next steps to create an optimised version of the model ready to be deployed via the Simplismart Model Suite. title

Optimize the Fine-Tuned Model

While compiling the LoRA with the base model, you will have the option to optimize the model for deployment.

Enter Model Details

Provide the name for your fine-tuned model. title

Select Optimizing Infrastructure

Choose the right optimization infrastructure for the model based on the size of the base model, specifically the GPU RAM required to run the model for a given quantization.
For example, a Llama 3.1 8B model can run on a T4 GPU with a 4-bit quantization but may run into CUDA OOM errors with an FP16 quantization.
title

Update Optimization Configuration

Modify the optimization settings as needed, and select the desired quantization for your optimised model. If unsure about the rest of the optimization configuration, leave it at the default values.
Please refrain from changing the model configuration in this step.

Add the Model

Click Add Model to save your fine-tuned model to the My Models section.

Deploy the Model

Once the model has been successfully optimised and saved to your repository, you can deploy it via the Simplismart Model Suite. You can refer to the deployment steps here.
I