What is AIMET?

AIMET(AI Model Efficiency Toolkit) is a quantization tools that works on deep learning models such as PyTorch and ONNX.

AIMET enables developer to:

  1. Simulate Quantization

  2. Quantize models with Post-Training Quantization(PTQ) techniques

  3. Quantization Aware Training(QAT) on PyTorch models with aimet-torch

  4. Visualize and experiment with model accuracy with various precision on activations and weights

  5. Create mixed-precision models

  6. Export Quantized model to deployable ONNX model format

With AIMET, developers can rapidly iterate on model to find best quantization profile to achieve state-of-the-art accuracy and latency. Developers can compile and run quantized model exported from AIMET to QNN or run directly via ONNX-Runtime.

../_images/aimet_overview.png

AIMET provides two python packages:

  1. AIMET-ONNX: Quantize ONNX model with PTQ techniques

  2. AIMET-Torch: Quantize PyTorch model with QAT

We recommend to start with AIMET-ONNX PTQ techniques, which has fastest turn-around time for quantization and experimentation. Defer to QAT with AIMET-Torch only if you have tried AIMET-ONNX mixed precision and advanced techniques for optimizing weights.

Supported platform

  • 64-bit Intel x86-compatible processor

  • Python 3.10

  • Ubuntu 22.04

  • For GPU variants:
    • Nvidia GPU card (Compute capability 5.2 or later)

    • Nvidia driver version 455 or later (using the latest driver is recommended; both CUDA and cuDNN are supported)

Get Started

Visit here to quick start.