Optimization techniques¶
Adaptive rounding¶
Uses training data to improve accuracy over naïve rounding.
Sequential MSE¶
Sequential MSE (SeqMSE) is a method that searches for optimal quantization encodings per operation (i.e. per layer) such that the difference between the original output activation and the corresponding quantization-aware output activation is minimized.
Batch norm folding¶
Folds BN layers into adjacent Convolution or Linear layers.
Cross-layer equalization¶
Scales the parameter ranges across different channels to increase the range for layers with low range and reduce range for layers with high range, enabling the same quantizaion parameters to be used across all channels.
Mixed precision¶
Allows quantization sensitive layers in higher precision (bit-width).
Automatic quantization¶
Analyzes the model, determines the best sequence of AIMET post-training quantization (PTQ) techniques, and applies these techniques.
Batch norm re-estimation¶
Re-estimated statistics are used to adjust the quantization scale parameters of preceding convolution or linear layers, effectively folding the BN layers.
Analysis tools¶
Analysis tools to automatically identify sensitive areas and hotspots in your pre-trained model.
Compression¶
Reduces pre-trained model’s Multiply-accumulate(MAC) and memory costs with a minimal drop in accuracy. AIMET supports various compression techniques like Weight SVD, Spatial SVD and Channel pruning.