Dynamic Lora Compilation

Introduction

What’s New?

This feature enables users to dynamically load multiple LoRAs (Low-Rank Adaptations) into a single model deployment. With this enhancement, you can tailor your models to diverse tasks and domains without creating multiple separate deployments. By leveraging multiple LoRAs simultaneously, you can optimize model performance, reduce inference time, and streamline your model management workflows.

Available Flags & Options

Flag Name	Type	Default	Description
`loras`	list	`[]`	List of LoRA configurations to load. See Example Configuration for the schema.
`lora_repo`	dict	`null`	Cloud storage path (e.g. S3, GCP) containing multiple LoRAs to load dynamically.
`load_lora_dynamic`	boolean	`false`	Enables dynamic loading of LoRAs into the base model. `false` will merge all provided LoRAs.

Example Pipeline Configuration

Using `loras` list:

When specifying LoRAs manually:

{
  "type": "llm",
  "loras": [
    {
      "id": "lora_id_0",
      "source": {
        "path": "s3://simplismart-model-repository/dobby-test-loras/Llama-3.1-8B-Instruct-GRPO-gsm8k-ft-lora",
        "type": "s3",
        "secret": { "type": "aws" }
      }
    },
    {
      "id": "lora_id_1",
      "source": {
        "path": "s3://simplismart-model-repository/dobby-test-loras/llama3.1_text2sql_instruct_tuned",
        "type": "s3",
        "secret": { "type": "aws" }
      }
    },
    {
      "id": "lora_id_2",
      "source": {
        "path": "raaec/llama3.1-8b-instruct-lora-model",
        "type": "hf",
        "secret": { "type": "hf" }
      }
    }
  ],
  "lora_repo": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": { "type": "" }
  },
  "quantized_model_path": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": { "type": "" }
  },
  "load_lora_dynamic": false
}

Using `lora_repo`:

When pulling LoRAs dynamically from a cloud directory:

"pipeline_config": {
  "type": "llm",
  "loras": [],
  "lora_repo": {
    "type": "s3",
    "path": "s3://simplismart-model-repository/dobby-test-loras",
    "ownership": "",
    "secret": { "type": "aws" }
  },
  "quantized_model_path": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": { "type": "" }
  },
  "load_lora_dynamic": false
}

Important Notes

✅ If load_lora_dynamic is false but the loras list contains more than one LoRA, then load_lora_dynamic will automatically be set to true.
✅ The id specified in each LoRA will be the model’s name during inferencing.
✅ When using a lora_repo, each subfolder inside the specified path will become a separate model.
✅ LoRAs will be dynamically merged or loaded at inference time depending on the load_lora_dynamic flag.
✅ To understand how to structure secrets, refer to the Secret Management documentation. Here are some sample secrets for LoRAs.

{ 
    "type": "aws",
    "access_key_id": "<access_key>",
    "secret_access_key": "<secret_key>" 
}

Recommendations

Performance: Use load_lora_dynamic = true if you want the system to load LoRAs on-demand and minimize startup time.
Organizational Structure: When using a lora_repo, name the directories intuitively, as those names will serve as model identifiers.
Security: Configure the appropriate secret for S3, GCP, HF, or any supported cloud source to ensure proper authentication and authorization.

Flux Compilation

LLM Compilation

Cluster Import

Dynamic Lora Compilation

Introduction

What’s New?

Available Flags & Options

Example Pipeline Configuration

Using `loras` list:

Using `lora_repo`:

Important Notes

Recommendations

Flux Compilation

LLM Compilation

Cluster Import

​Introduction

​What’s New?

​Available Flags & Options

​Example Pipeline Configuration

​Using loras list:

​Using lora_repo:

​Important Notes

​Recommendations

Introduction

What’s New?

Available Flags & Options

Example Pipeline Configuration

Using `loras` list:

Using `lora_repo`:

Important Notes

Recommendations