What is AIMET?¶

AIMET(AI Model Efficiency Toolkit) is a quantization tools that works on deep learning models such as PyTorch and ONNX.

AIMET enables developer to:

Simulate Quantization
Quantize models with Post-Training Quantization(PTQ) techniques
Quantization Aware Training(QAT) on PyTorch models with aimet-torch
Visualize and experiment with model accuracy with various precision on activations and weights
Create mixed-precision models
Export Quantized model to deployable ONNX model format

With AIMET, developers can rapidly iterate on model to find best quantization profile to achieve state-of-the-art accuracy and latency. Developers can compile and run quantized model exported from AIMET to QNN or run directly via ONNX-Runtime.

AIMET provides two python packages:

AIMET-ONNX: Quantize ONNX model with PTQ techniques
AIMET-Torch: Quantize PyTorch model with QAT

We recommend to start with AIMET-ONNX PTQ techniques, which has fastest turn-around time for quantization and experimentation. Defer to QAT with AIMET-Torch only if you have tried AIMET-ONNX mixed precision and advanced techniques for optimizing weights.

Supported platform¶

64-bit Intel x86-compatible processor
Python 3.10
Ubuntu 22.04
For GPU variants:
- Nvidia GPU card (Compute capability 5.2 or later)
- Nvidia driver version 455 or later (using the latest driver is recommended; both CUDA and cuDNN are supported)

Get Started¶

Visit here to quick start.