Compilation Configurations
Compilation Configurations
Reference guide for model compilation configurations, including Optimization settings and Quantization options for different hardware environments.
Optimization Configuration
Quantization Types
- Float 32 (FP32)
- Full precision.
- Highest accuracy.
- Maximum memory usage.
- Float 16 (FP16)
- Reduced precision.
- Minimal accuracy loss.
- Recommended for most use cases.
- Balances performance and accuracy.
- Float 8 (FP8)
- Advanced reduced precision.
- Hardware Limitations
- Not supported on A100 GPU architecture.
- Only available on H100 GPUs.
- INT4 Quantization
- Extreme compression.
- Substantial memory reduction.
- Noticeable accuracy degradation.
- AWQ (Activation-aware Weight Quantization)
- Advanced compression technique.
- Maintains model performance.
- Minimal accuracy loss.