AIMET PyTorch Quantization APIs
aimet_torch
Important
aimet_torch
package is planned to be upgraded to aimet_torch.v2
with more flexibile, extensible, and PyTorch-friendly user interface! In a future release, the core APIs of aimet_torch
will be fully replaced with the equivalents in aimet_torch.v2
. For more information, please refer to aimet_torch.v2
API reference.
In order to make full use of AIMET Quantization features, there are several guidelines users are encouraged to follow when defining PyTorch models. AIMET provides APIs which can automate some of the model definition changes and checks whether AIMET Quantization features can be applied on PyTorch model.
- Users should first invoke Model Preparer API before using any of the AIMET Quantization features.
Model Guidelines: Guidelines for defining PyTorch models
Architecture Checker API: Allows user to check for performance concern with the model.
Model Preparer API: Allows user to automate model definition changes
Model Validator API: Allows user to check whether AIMET Quantization feature can be applied on a PyTorch model
- AIMET Quantization for PyTorch Models provides the following functionality.
Quant Analyzer API: Analyzes the model and points out sensitive layers to quantization
Quantization Simulation API: Allows ability to simulate inference and training on quantized hardware
Adaptive Rounding API: Post-training quantization technique to optimize rounding of weight tensors
Cross-Layer Equalization API: Post-training quantization technique to equalize layer parameters
Bias Correction API: Post-training quantization technique to correct shift in layer outputs due to quantization noise
AutoQuant API: Unified API that integrates the post-training quantization techniques provided by AIMET
BN Re-estimation APIs: APIs that Re-estimate BN layers’ statistics and fold the BN layers
PEFT LoRA APIs: APIs to integrate PEFT LoRA with AIMET Quantization flow
- If a user wants to use Multi-GPU with CLE or QAT, they can refer to:
Multi-GPU guidelines: Guidelines to use PyTorch DataParallel API with AIMET features
API Reference
aimet_torch.v2
Introducing aimet_torch v2, a future version of aimet_torch with more powerful quantization features and PyTorch-friendly user interface!
What’s New
These are some of the powerful new features and interfaces supported in aimet_torch.v2
Dispatching Custom Quantized Kernels
Backwards Compatibility
Good news! aimet_torch v2 is carefully designed to be fully backwards-compatibile with all previous public APIs of aimet_torch.
All you need is drop-in replacement of import statements from aimet_torch
to aimet_torch.v2
as below!
-from aimet_torch.v1.quantsim import QuantizationSimModel
+from aimet_torch.v2.quantsim import QuantizationSimModel
-from aimet_torch.v1.adaround.adaround_weight import Adaround, AdaroundParameters
+from aimet_torch.v2.adaround import Adaround, AdaroundParameters
-from aimet_torch.v1.seq_mse import apply_seq_mse
+from aimet_torch.v2.seq_mse import apply_seq_mse
-from aimet_torch.v1.quant_analyzer import QuantAnalyzer
+from aimet_torch.v2.quant_analyzer import QuantAnalyzer
All the other APIs that didn’t changed in or are orthogonal with aimet_torch.v2 will be still accessible via aimet_torch
namespace as before.
For more detailed information about how to migrate to aimet_torch.v2, see aimet_torch.v2 migration guide