Quick Start¶

This page describes how to quickly install the latest version of AIMET for the ONNX and PyTorch framework.

For all the framework variants and compute platforms, see Installation.

Tested platform¶

aimet-torch and aimet-onnx have been validated on the following platform:

64-bit Intel x86-compatible processor
Python 3.10
Ubuntu 22.04
For GPU variants:
- Nvidia GPU card (Compute capability 5.2 or later)
- Nvidia driver version 455 or later (using the latest driver is recommended; both CUDA and cuDNN are supported)

Installing AIMET¶

Install AIMET from PyPI

ONNX

pip install aimet-onnx
# Optional: To accelerate quantization with CUDA
pip install onnxruntime-gpu

PyTorch

pip install aimet-torch

Verifying the installation¶

ONNX

import aimet_onnx
print(aimet_onnx.__version__)

PyTorch

import aimet_torch
print(aimet_torch.__version__)

Quantize a small model quickly with AIMET¶

Create a QuantizationSimModel, perform calibration, and then evaluate it.

Step 1: Handle imports and other setup.

ONNX

import os
import onnx
import torch
from torchvision.models import mobilenet_v2

model = mobilenet_v2(weights='DEFAULT').eval()

dummy_input = torch.randn((10, 3, 224, 224))
file_path = os.path.join('/tmp', f'mobilenet_v2.onnx')
torch.onnx.export(model, dummy_input, file_path)
onnx_model = onnx.load_model(file_path)

PyTorch

import torch
from torchvision.models import mobilenet_v2

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = mobilenet_v2(weights='DEFAULT').eval().to(device)
dummy_input = torch.randn((10, 3, 224, 224), device=device)

Step 2: Create a QuantizationSimModel and ensure the model contains quantization operations.

ONNX

from aimet_onnx.quantsim import QuantizationSimModel
from aimet_onnx import int8, int16
from aimet_onnx.utils import make_dummy_input

sim = QuantizationSimModel(onnx_model,
                           param_type=int8,
                           activation_type=int16)

PyTorch

from aimet_common.defs import QuantScheme
from aimet_common.quantsim_config.utils import get_path_for_per_channel_config
from aimet_torch.quantsim import QuantizationSimModel

sim = QuantizationSimModel(model, 
                           dummy_input,
                           default_param_bw=8,
                           default_output_bw=16)
print(sim)

Step 3: Calibrate the model. This example uses random values as input. In real-world cases, calibration should be performed using a representative dataset.

ONNX

calibration_data = make_dummy_input(onnx_model)
sim.compute_encodings(inputs=[calibration_data])

PyTorch

def forward_pass(model):
    with torch.no_grad():
        model(torch.randn((10, 3, 224, 224), device=device))

sim.compute_encodings(forward_pass)

Step 4: Evaluate the model.

Infer directly on QuantSim model to check quantized model’s accuracy.

ONNX

input_name = tuple(calibration_data.keys())[0]
output = sim.session.run(None, { input_name : dummy_input.numpy() })
print(output)

PyTorch

output = sim.model(dummy_input)
print(output)

Sample output of QuantSim model is shown below:

ONNX

[array([[-0.4599525 ,  0.35107604,  0.43178225, ..., -0.45040053,
          0.1450607 ,  0.23799022],
        [-0.4132449 ,  0.20722957,  0.60808927, ..., -0.5315115 ,
          -0.01675645,  0.22884297],
        [-0.4677236 ,  0.3576329 ,  0.5317543 , ..., -0.50366503,
          -0.01392324, -0.0897725 ],
        ...,
        [-0.4503196 ,  0.3851556 ,  0.56810045, ..., -0.6998855 ,
          0.03513189,  0.36678016],
        [-0.27045077,  0.28065038,  0.46723792, ..., -0.24665177,
          -0.11899511,  0.03658897],
        [-0.43477735,  0.35536635,  0.62274104, ..., -0.5091695 ,
          -0.11446196,  0.10984787]], dtype=float32)]

PyTorch

DequantizedTensor([[-0.4186,  0.2494,  0.5203,  ..., -0.5985,  0.0303, 0.0086],
                   [-0.4236,  0.2259,  0.3209,  ..., -0.4933, -0.0234, 0.1080],
                   [-0.4082,  0.1676,  0.5803,  ..., -0.4130, -0.1609, -0.0252],
                    ...,
                   [-0.3258,  0.3724,  0.4404,  ..., -0.4881, -0.0870, 0.1108],
                   [-0.3687,  0.3706,  0.5825,  ..., -0.3178,  0.0422, -0.0600],
                   [-0.3603,  0.3587,  0.6014,  ..., -0.5430, -0.1279, 0.2029]], grad_fn=<AliasBackward0>)

Now you are all set to use AIMET to quantize your model. Try out end-to-end example from Post-Training Quantization example with ImageNet dataset.