Optimization techniques

Adaptive rounding

Uses training data to improve accuracy over naïve rounding.

Sequential MSE

Sequential MSE (SeqMSE) is a method that searches for optimal quantization encodings per operation (i.e. per layer) such that the difference between the original output activation and the corresponding quantization-aware output activation is minimized.

Batch norm folding

Folds BN layers into adjacent Convolution or Linear layers.

Cross-layer equalization

Scales the parameter ranges across different channels to increase the range for layers with low range and reduce range for layers with high range, enabling the same quantizaion parameters to be used across all channels.

Mixed precision

Allows quantization sensitive layers in higher precision (bit-width).

Automatic quantization

Analyzes the model, determines the best sequence of AIMET post-training quantization (PTQ) techniques, and applies these techniques.

Batch norm re-estimation

Re-estimated statistics are used to adjust the quantization scale parameters of preceding convolution or linear layers, effectively folding the BN layers.

Analysis tools

Analysis tools to automatically identify sensitive areas and hotspots in your pre-trained model.

Compression

Reduces pre-trained model’s Multiply-accumulate(MAC) and memory costs with a minimal drop in accuracy. AIMET supports various compression techniques like Weight SVD, Spatial SVD and Channel pruning.