Dynamic Lora Compilation
Introduction
What’s New?
This feature enables users to dynamically load multiple LoRAs (Low-Rank Adaptations) into a single model deployment. With this enhancement, you can tailor your models to diverse tasks and domains without creating multiple separate deployments. By leveraging multiple LoRAs simultaneously, you can optimize model performance, reduce inference time, and streamline your model management workflows.
Available Flags & Options
Flag Name | Type | Default | Description |
---|---|---|---|
loras | list | [] | List of LoRA configurations to load. See Example Configuration for the schema. |
lora_repo | dict | null | Cloud storage path (e.g. S3, GCP) containing multiple LoRAs to load dynamically. |
load_lora_dynamic | boolean | false | Enables dynamic loading of LoRAs into the base model. false will merge all provided LoRAs. |
Example Pipeline Configuration
Using loras
list:
When specifying LoRAs manually:
Using lora_repo
:
When pulling LoRAs dynamically from a cloud directory:
Important Notes
✅ If load_lora_dynamic
is false
but the loras
list contains more than one LoRA, then load_lora_dynamic
will automatically be set to true
.
✅ The id
specified in each LoRA will be the model’s name during inferencing.
✅ When using a lora_repo
, each subfolder inside the specified path will become a separate model.
✅ LoRAs will be dynamically merged or loaded at inference time depending on the load_lora_dynamic
flag.
✅ To understand how to structure secrets, refer to the Secret Management documentation. Here are some sample secrets for LoRAs.
Recommendations
- Performance: Use
load_lora_dynamic = true
if you want the system to load LoRAs on-demand and minimize startup time. - Organizational Structure: When using a
lora_repo
, name the directories intuitively, as those names will serve as model identifiers. - Security: Configure the appropriate
secret
for S3, GCP, HF, or any supported cloud source to ensure proper authentication and authorization.