AIMET PyTorch Quantization APIs

aimet_torch

Important

aimet_torch package is planned to be upgraded to aimet_torch.v2 with more flexibile, extensible, and PyTorch-friendly user interface! In a future release, the core APIs of aimet_torch will be fully replaced with the equivalents in aimet_torch.v2. For more information, please refer to aimet_torch.v2 API reference.

In order to make full use of AIMET Quantization features, there are several guidelines users are encouraged to follow when defining PyTorch models. AIMET provides APIs which can automate some of the model definition changes and checks whether AIMET Quantization features can be applied on PyTorch model.

Users should first invoke Model Preparer API before using any of the AIMET Quantization features.

Model Guidelines: Guidelines for defining PyTorch models
Architecture Checker API: Allows user to check for performance concern with the model.
Model Preparer API: Allows user to automate model definition changes
Model Validator API: Allows user to check whether AIMET Quantization feature can be applied on a PyTorch model

AIMET Quantization for PyTorch Models provides the following functionality.

Quant Analyzer API: Analyzes the model and points out sensitive layers to quantization
Quantization Simulation API: Allows ability to simulate inference and training on quantized hardware
Adaptive Rounding API: Post-training quantization technique to optimize rounding of weight tensors
Cross-Layer Equalization API: Post-training quantization technique to equalize layer parameters
Bias Correction API: Post-training quantization technique to correct shift in layer outputs due to quantization noise
AutoQuant API: Unified API that integrates the post-training quantization techniques provided by AIMET
BN Re-estimation APIs: APIs that Re-estimate BN layers’ statistics and fold the BN layers
PEFT LoRA APIs: APIs to integrate PEFT LoRA with AIMET Quantization flow

If a user wants to use Multi-GPU with CLE or QAT, they can refer to:

Multi-GPU guidelines: Guidelines to use PyTorch DataParallel API with AIMET features

API Reference

aimet_torch.v2

Introducing aimet_torch v2, a future version of aimet_torch with more powerful quantization features and PyTorch-friendly user interface!

What’s New

These are some of the powerful new features and interfaces supported in aimet_torch.v2

Backwards Compatibility

Good news! aimet_torch v2 is carefully designed to be fully backwards-compatibile with all previous public APIs of aimet_torch. All you need is drop-in replacement of import statements from aimet_torch to aimet_torch.v2 as below!

-from aimet_torch.v1.quantsim import QuantizationSimModel
+from aimet_torch.v2.quantsim import QuantizationSimModel

-from aimet_torch.v1.adaround.adaround_weight import Adaround, AdaroundParameters
+from aimet_torch.v2.adaround import Adaround, AdaroundParameters

-from aimet_torch.v1.seq_mse import apply_seq_mse
+from aimet_torch.v2.seq_mse import apply_seq_mse

-from aimet_torch.v1.quant_analyzer import QuantAnalyzer
+from aimet_torch.v2.quant_analyzer import QuantAnalyzer

All the other APIs that didn’t changed in or are orthogonal with aimet_torch.v2 will be still accessible via aimet_torch namespace as before.

For more detailed information about how to migrate to aimet_torch.v2, see aimet_torch.v2 migration guide