> ## Documentation Index
> Fetch the complete documentation index at: https://docs.simplismart.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Compilation Configurations

> Reference guide for model compilation configurations, including Optimization settings and Quantization options for different hardware environments.

## **Optimization Configuration**

```json theme={null}
{
  "warmups": {
    "enabled": true,
    "iterations": 5,
    "sample_input_data": []
  },
  "backend": {
    "name": "auto",
    "version": "latest",
    "extra_params": {}
  },
  "optimisations": {
    "speculative_decoding": {
      "enabled": false,
      "type": "auto",
      "extra_params": {}
    },
    "attention_caching": {
      "enabled": false,
      "type": "auto",
      "extra_params": {}
    }
  },
  "tensor_parallel_size": 1,
  "quantization": "float16"
}
```

***

## **Quantization Types**

1. **Float 32 (FP32)**
   * Full precision.
   * Highest accuracy.
   * Maximum memory usage.
2. **Float 16 (FP16)**
   * Reduced precision.
   * Minimal accuracy loss.
   * Recommended for most use cases.
   * Balances performance and accuracy.
3. **Float 8 (FP8)**
   * Advanced reduced precision.
   * **Hardware Limitations**
     * Not supported on **A100 GPU** architecture.
     * Only available on **H100 GPUs**.
4. **INT4 Quantization**
   * Extreme compression.
   * Substantial memory reduction.
   * Noticeable accuracy degradation.
5. **AWQ (Activation-aware Weight Quantization)**
   * Advanced compression technique.
   * Maintains model performance.
   * Minimal accuracy loss.

***

## Model Configuration

```json theme={null}
{
  "type": "llm",
  "loras": [],
  "lora_repo": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": {
      "type": ""
    }
  },
  "quantized_model_path": {
    "type": "",
    "path": "",
    "ownership": "",
    "secret": {
      "type": ""
    }
  }
}
```
