Enter Model Details
- Model name: Provide a name for your model.
- Source: Specify the source from which the model will be fetched. You can choose from:
- HuggingFace Model Hub – Provide the repository name (e.g.,
creator/model-slug
). - AWS S3 – Enter the S3 bucket path.
- GCP GCS – Enter the Google Cloud Storage bucket path.
- Public URL – Provide a publicly accessible model download link.
- HuggingFace Model Hub – Provide the repository name (e.g.,
- Path: Enter the path to the model file or directory (e.g., openai/whisper-large-v3-turbo.)
Cloud credentials (Required for AWS S3 or GCP GCS)
Provide your cloud credentials (Secret) to enable secure access to private storage buckets.
Provide your cloud credentials (Secret) to enable secure access to private storage buckets.

Getting your model path from HuggingFace
Getting your model path from HuggingFace
- Visit huggingface.co.
- Use the search bar to find the desired model. (e.g., “whisper-large”)
- Click on the model you want from the search results. (e.g., openai/whisper-large-v3-turbo)
- Copy the model path displayed at the top of the page (e.g., openai/whisper-large-v3-turbo) for use.
The model path on HuggingFace follows the format: creator/model-slug.
Note: Only instruct-style models are supported in the model compilation step for LLMs. These are typically chat-optimized models and are often identified by the suffix
-Instruct
in their names (e.g., meta-llama/Llama-3.2-3B-Instruct
).Base models such as meta-llama/Llama-3.2-3B
(without the -Instruct
suffix) are not supported.Optimizing Infrastructure
- Configure the infrastructure to optimize the model’s performance, such as selecting the appropriate compute resources and optimization techniques.

Configuration
- Select the desired quantization format: FP16 or AWQ (based on your performance and resource requirements)
- FP16 (Half-Precision): Offers higher precision and accuracy, but requires more GPU memory and compute power.
- AWQ (Activation-aware Weight Quantization): Reduces model size and memory usage with minimal impact on accuracy, making it suitable for resource-constrained environments.
- The optimization, model, and pipeline configurations are auto-filled based on the details provided earlier. You may modify them if required to suit your deployment needs.
- Finalize the model’s configuration by setting any additional parameters or preferences required for deployment.
