Introduction

What’s New?

This feature enables users to dynamically load multiple LoRAs (Low-Rank Adaptations) into a single model deployment. With this enhancement, you can tailor your models to diverse tasks and domains without creating multiple separate deployments. By leveraging multiple LoRAs simultaneously, you can optimize model performance, reduce inference time, and streamline your model management workflows.

Available Flags & Options

Flag NameTypeDefaultDescription
loraslist[]List of LoRA configurations to load. See Example Configuration for the schema.
lora_repodictnullCloud storage path (e.g. S3, GCP) containing multiple LoRAs to load dynamically.
load_lora_dynamicbooleanfalseEnables dynamic loading of LoRAs into the base model. false will merge all provided LoRAs.

Example Pipeline Configuration

Using loras list:

When specifying LoRAs manually:

{
  "type": "llm",
  "loras": [
    {
      "id": "lora_id_0",
      "source": {
        "path": "s3://simplismart-model-repository/dobby-test-loras/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora",
        "type": "s3",
        "secret": { "type": "aws" }
      }
    },
    {
      "id": "lora_id_1",
      "source": {
        "path": "s3://simplismart-model-repository/dobby-test-loras/llama3.1_text2sql_instruct_tuned",
        "type": "s3",
        "secret": { "type": "aws" }
      }
    },
    {
      "id": "lora_id_2",
      "source": {
        "path": "raaec/llama3.1-8b-instruct-lora-model",
        "type": "hf",
        "secret": { "type": "hf" }
      }
    }
  ],
  "lora_repo": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": { "type": "" }
  },
  "quantized_model_path": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": { "type": "" }
  },
  "load_lora_dynamic": false
}

Using lora_repo:

When pulling LoRAs dynamically from a cloud directory:

"pipeline_config": {
  "type": "llm",
  "loras": [],
  "lora_repo": {
    "type": "s3",
    "path": "s3://simplismart-model-repository/dobby-test-loras",
    "ownership": "",
    "secret": { "type": "aws" }
  },
  "quantized_model_path": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": { "type": "" }
  },
  "load_lora_dynamic": false
}

Important Notes

✅ If load_lora_dynamic is false but the loras list contains more than one LoRA, then load_lora_dynamic will automatically be set to true.
✅ The id specified in each LoRA will be the model’s name during inferencing.
✅ When using a lora_repo, each subfolder inside the specified path will become a separate model.
✅ LoRAs will be dynamically merged or loaded at inference time depending on the load_lora_dynamic flag.
✅ To understand how to structure secrets, refer to the Secret Management documentation. Here are some sample secrets for LoRAs.

{ 
    "type": "aws",
    "access_key_id": "<access_key>",
    "secret_access_key": "<secret_key>" 
}

Recommendations

  • Performance: Use load_lora_dynamic = true if you want the system to load LoRAs on-demand and minimize startup time.
  • Organizational Structure: When using a lora_repo, name the directories intuitively, as those names will serve as model identifiers.
  • Security: Configure the appropriate secret for S3, GCP, HF, or any supported cloud source to ensure proper authentication and authorization.