What is AIMET?¶
AIMET(AI Model Efficiency Toolkit) is a quantization tools that works on deep learning models such as PyTorch and ONNX.
AIMET enables developer to:
Simulate Quantization
Quantize models with Post-Training Quantization(PTQ) techniques
Quantization Aware Training(QAT) on PyTorch models with aimet-torch
Visualize and experiment with model accuracy with various precision on activations and weights
Create mixed-precision models
Export Quantized model to deployable ONNX model format
With AIMET, developers can rapidly iterate on model to find best quantization profile to achieve state-of-the-art accuracy and latency. Developers can compile and run quantized model exported from AIMET to QNN or run directly via ONNX-Runtime.

AIMET provides two python packages:
AIMET-ONNX: Quantize ONNX model with PTQ techniques
AIMET-Torch: Quantize PyTorch model with QAT
We recommend to start with AIMET-ONNX PTQ techniques, which has fastest turn-around time for quantization and experimentation. Defer to QAT with AIMET-Torch only if you have tried AIMET-ONNX mixed precision and advanced techniques for optimizing weights.
Supported platform¶
64-bit Intel x86-compatible processor
Python 3.10
Ubuntu 22.04
- For GPU variants:
Nvidia GPU card (Compute capability 5.2 or later)
Nvidia driver version 455 or later (using the latest driver is recommended; both CUDA and cuDNN are supported)
Get Started¶
Visit here to quick start.