Introduction
What’s New?
This feature enables users to dynamically load multiple LoRAs (Low-Rank Adaptations) into a single model deployment. With this enhancement, you can tailor your models to diverse tasks and domains without creating multiple separate deployments. By leveraging multiple LoRAs simultaneously, you can optimize model performance, reduce inference time, and streamline your model management workflows.Available Flags & Options
| Flag Name | Type | Default | Description |
|---|---|---|---|
loras | list | [] | List of LoRA configurations to load. See Example Configuration for the schema. |
lora_repo | dict | null | Cloud storage path (e.g. S3, GCP) containing multiple LoRAs to load dynamically. |
load_lora_dynamic | boolean | false | Enables dynamic loading of LoRAs into the base model. false will merge all provided LoRAs. |
Example Pipeline Configuration
Using loras list:
When specifying LoRAs manually:
Using lora_repo:
When pulling LoRAs dynamically from a cloud directory:
Important Notes
✅ Ifload_lora_dynamic is false but the loras list contains more than one LoRA, then load_lora_dynamic will automatically be set to true.✅ The
id specified in each LoRA will be the model’s name during inferencing.✅ When using a
lora_repo, each subfolder inside the specified path will become a separate model.✅ LoRAs will be dynamically merged or loaded at inference time depending on the
load_lora_dynamic flag.✅ To understand how to structure secrets, refer to the Secret Management documentation. Here are some sample secrets for LoRAs.
Recommendations
- Performance: Use
load_lora_dynamic = trueif you want the system to load LoRAs on-demand and minimize startup time. - Organizational Structure: When using a
lora_repo, name the directories intuitively, as those names will serve as model identifiers. - Security: Configure the appropriate
secretfor S3, GCP, HF, or any supported cloud source to ensure proper authentication and authorization.