AIMET AdaRound
By default, AIMET uses nearest rounding for quantization. A single weight value in a weight tensor is illustrated in the following figure. In nearest rounding, this weight value is quantized to the nearest integer value.
The Adaptive Rounding (AdaRound) feature uses a subset of the unlabeled training data to adaptively round weights. In the following figure, the weight value is quantized to the integer value far from it.
AdaRound optimizes a loss function using the unlabelled training data to decide whether to quantize a weight to the closer or further integer value. AdaRound quantization achieves accuracy closer to the FP32 model, while using low bit-width integer quantization.
When creating a QuantizationSimModel using AdaRounded, use the QuantizationSimModel provided in the API to set and freeze parameter encodings before computing the encodings. Refer the code example in the AdaRound API.
AdaRound use cases
Terminology
The following abbreviations are used in the following use case descriptions:
- BC
Bias Correction
- BNF
Batch Norm Folding
- CLE
Cross Layer Equalization
- HBF
High Bias Folding
- QAT
Quantization Aware Training
- { }
An optional step in the use case
Recommended
The following sequences are recommended:
- {BNF} –> {CLE} –> AdaRound
Applying BNF and CLE are optional steps before applying AdaRound. Some models benefit from applying CLE while some don’t.
- AdaRound –> QAT
AdaRound is a post-training quantization feature, but for some models applying BNF and CLE may not help. For these models, applying AdaRound before QAT might help. AdaRound is a better weights initialization step that speeds up QAT.
Not recommended
Applying bias correction (BC) either before or after AdaRound is not recommended.
AdaRound –> BC
BC –> AdaRound
AdaRound hyper parameters guidelines
A number of hyper parameters used during AdaRound optimization are exposed to users. The default values of some of these parameters lead to stable, good results over many models; we recommend that you not change these.
Use the following guideline for adjusting hyper parameters with AdaRound.
- Hyper Parameters to be changed often
Number of batches (approximately 500-1000 images. If batch size of data loader is 64, then 16x the number of batches leads to 1024 images)
Number of iterations(default 10000)
- Hyper Parameters to change with caution
Regularization parameter (default 0.01)
- Hyper Parameters to avoid changing
Beta range (default (20, 2))
Warm start period (default 20%)
AdaRound API
See the AdaRound API variant for your platform:
AdaRound for PyTorch
AdaRound for Keras
AdaRound for ONNX