AIMET PyTorch Quantization APIs¶

In order to make full use of AIMET Quantization features, there are several guidelines users are encouraged to follow when defining PyTorch models. AIMET provides APIs which can automate some of the model definition changes and checks whether AIMET Quantization features can be applied on PyTorch model.

Users should first invoke Model Preparer API before using any of the AIMET Quantization features.

Model Guidelines: Guidelines for defining PyTorch models
Architecture Checker API: Allows user to check for performance concern with the model.
Model Preparer API: Allows user to automate model definition changes
Model Validator API: Allows user to check whether AIMET Quantization feature can be applied on a PyTorch model

AIMET Quantization for PyTorch Models provides the following functionality.

Quant Analyzer API: Analyzes the model and points out sensitive layers to quantization
Quantization Simulation API: Allows ability to simulate inference and training on quantized hardware
Adaptive Rounding API: Post-training quantization technique to optimize rounding of weight tensors
Cross-Layer Equalization API: Post-training quantization technique to equalize layer parameters
Bias Correction API: Post-training quantization technique to correct shift in layer outputs due to quantization noise
AutoQuant API: Unified API that integrates the post-training quantization techniques provided by AIMET
BN Re-estimation APIs: APIs that Re-estimate BN layers’ statistics and fold the BN layers

If a user wants to use Multi-GPU with CLE or QAT, they can refer to:

Multi-GPU guidelines: Guidelines to use PyTorch DataParallel API with AIMET features