Optimization techniques¶

Adaptive rounding ¶

Uses training data to improve accuracy over naïve rounding.

Sequential MSE (SeqMSE) is a method that searches for optimal quantization encodings per operation (i.e. per layer) such that the difference between the original output activation and the corresponding quantization-aware output activation is minimized.

Batch norm folding ¶

Folds BN layers into adjacent Convolution or Linear layers.

Cross-layer equalization ¶

Scales the parameter ranges across different channels to increase the range for layers with low range and reduce range for layers with high range, enabling the same quantization parameters to be used across all channels.

AdaScale ¶

AdaScale is a PTQ technique to improve accuracy of the quantized model by introducing learnable parameters in the weight quantizers and by performing BKD(Blockwise Knowledge Distillation) with respect to the corresponding FP output.

Mixed precision ¶

Allows quantization sensitive layers in higher precision (bit-width).

Automatic quantization ¶

Analyzes the model, determines the best sequence of AIMET post-training quantization (PTQ) techniques, and applies these techniques.

Batch norm re-estimation ¶

Re-estimated statistics are used to adjust the quantization scale parameters of preceding convolution or linear layers, effectively folding the BN layers.

Analysis tools ¶

Analysis tools to automatically identify sensitive areas and hotspots in your pre-trained model.

Compression ¶

Reduces pre-trained model’s Multiply-accumulate(MAC) and memory costs with a minimal drop in accuracy. AIMET supports various compression techniques like Weight SVD, Spatial SVD and Channel pruning.

Quantized LoRa ¶

Workflows to perform LoRa (Low-Rank Adaptation) on quantized large models.

OmniQuant ¶

OmniQuant is a PTQ technique to improve accuracy of the quantized model by introducing learnable parameter (scale) in the weight quantizers and by performing BKD(Blockwise Knowledge Distillation) with respect to the corresponding FP output.