Skip to main content

Enter Model Details

  • Model name: Provide a name for your model.
  • Source: Specify the source from which the model will be fetched. You can choose from:
    • HuggingFace Model Hub – Provide the repository name (e.g.,creator/model-slug).
    • AWS S3 – Enter the S3 bucket path.
    • GCP GCS – Enter the Google Cloud Storage bucket path.
    • Public URL – Provide a publicly accessible model download link.
  • Path: Enter the path to the model file or directory (e.g., openai/whisper-large-v3-turbo.)
Cloud credentials (Required for AWS S3 or GCP GCS)
Provide your cloud credentials (Secret) to enable secure access to private storage buckets.
Alt Text
  • Visit huggingface.co.
  • Use the search bar to find the desired model. (e.g., “whisper-large”)
  • Click on the model you want from the search results. (e.g., openai/whisper-large-v3-turbo)
  • Copy the model path displayed at the top of the page (e.g., openai/whisper-large-v3-turbo) for use.
The model path on HuggingFace follows the format: creator/model-slug.
Note: Only instruct-style models are supported in the model compilation step for LLMs. These are typically chat-optimized models and are often identified by the suffix -Instruct in their names (e.g., meta-llama/Llama-3.2-3B-Instruct).Base models such as meta-llama/Llama-3.2-3B (without the -Instruct suffix) are not supported.

Optimizing Infrastructure

  • Configure the infrastructure to optimize the model’s performance, such as selecting the appropriate compute resources and optimization techniques.
Alt Text

Configuration

  • Select the desired quantization format: FP16 or AWQ (based on your performance and resource requirements)
    • FP16 (Half-Precision): Offers higher precision and accuracy, but requires more GPU memory and compute power.
    • AWQ (Activation-aware Weight Quantization): Reduces model size and memory usage with minimal impact on accuracy, making it suitable for resource-constrained environments.
  • The optimization, model, and pipeline configurations are auto-filled based on the details provided earlier. You may modify them if required to suit your deployment needs.
  • Finalize the model’s configuration by setting any additional parameters or preferences required for deployment.
Alt Text
I