Quick Start¶
This page describes how to quickly install the latest version of AIMET for the ONNX and PyTorch framework.
For all the framework variants and compute platforms, see Installation.
Tested platform¶
aimet-torch and aimet-onnx have been validated on the following platform:
64-bit Intel x86-compatible processor
Python 3.10
Ubuntu 22.04
- For GPU variants:
Nvidia GPU card (Compute capability 5.2 or later)
Nvidia driver version 455 or later (using the latest driver is recommended; both CUDA and cuDNN are supported)
Installing AIMET¶
Install AIMET from PyPI
pip install aimet-onnx
pip install aimet-torch
Verifying the installation¶
import aimet_onnx
print(aimet_onnx.__version__)
import aimet_torch
print(aimet_torch.__version__)
Quantize a small model quickly with AIMET¶
Create a QuantizationSimModel
, perform calibration,
and then evaluate it.
Step 1: Handle imports and other setup.
import os
import onnx
import torch
from torchvision.models import mobilenet_v2
model = mobilenet_v2(weights='DEFAULT').eval()
dummy_input = torch.randn((10, 3, 224, 224))
file_path = os.path.join('/tmp', f'mobilenet_v2.onnx')
torch.onnx.export(model, dummy_input, file_path)
onnx_model = onnx.load_model(file_path)
import torch
from torchvision.models import mobilenet_v2
device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = mobilenet_v2(weights='DEFAULT').eval().to(device)
dummy_input = torch.randn((10, 3, 224, 224), device=device)
Step 2: Create a QuantizationSimModel
and ensure the model contains quantization operations.
from aimet_onnx.quantsim import QuantizationSimModel
from aimet_onnx import int8, int16
from aimet_onnx.utils import make_dummy_input
sim = QuantizationSimModel(onnx_model,
param_type=int8,
activation_type=int16)
from aimet_common.defs import QuantScheme
from aimet_common.quantsim_config.utils import get_path_for_per_channel_config
from aimet_torch.quantsim import QuantizationSimModel
sim = QuantizationSimModel(model,
dummy_input,
quant_scheme=QuantScheme.training_range_learning_with_tf_init,
config_file=get_path_for_per_channel_config(),
default_param_bw=8,
default_output_bw=16)
print(sim)
Step 3: Calibrate the model. This example uses random values as input. In real-world cases, calibration should be performed using a representative dataset.
calibration_data = make_dummy_input(onnx_model)
sim.compute_encodings(inputs=[calibration_data])
def forward_pass(model):
with torch.no_grad():
model(torch.randn((10, 3, 224, 224), device=device))
sim.compute_encodings(forward_pass)
Step 4: Evaluate the model.
Infer directly on QuantSim model to check quantized model’s accuracy.
input_name = tuple(calibration_data.keys())[0]
output = sim.session.run(None, { input_name : dummy_input.numpy() })
print(output)
output = sim.model(dummy_input)
print(output)
Sample output of QuantSim model is shown below:
[array([[-0.4599525 , 0.35107604, 0.43178225, ..., -0.45040053,
0.1450607 , 0.23799022],
[-0.4132449 , 0.20722957, 0.60808927, ..., -0.5315115 ,
-0.01675645, 0.22884297],
[-0.4677236 , 0.3576329 , 0.5317543 , ..., -0.50366503,
-0.01392324, -0.0897725 ],
...,
[-0.4503196 , 0.3851556 , 0.56810045, ..., -0.6998855 ,
0.03513189, 0.36678016],
[-0.27045077, 0.28065038, 0.46723792, ..., -0.24665177,
-0.11899511, 0.03658897],
[-0.43477735, 0.35536635, 0.62274104, ..., -0.5091695 ,
-0.11446196, 0.10984787]], dtype=float32)]
DequantizedTensor([[-1.7466, 0.8405, 1.8606, ..., -0.9714, 0.8366, 2.2363],
[-1.6091, 1.0449, 1.7788, ..., -0.9904, 1.0861, 2.2431],
[-1.5307, 0.8442, 1.5157, ..., -0.7793, 0.6327, 2.3861],
...,
[-1.3610, 1.4499, 2.2068, ..., -0.8188, 1.1155, 2.5962],
[-1.1619, 1.2217, 2.1050, ..., -0.5301, 0.9150, 2.1458],
[-1.6340, 0.9826, 2.2459, ..., -1.0769, 0.9054, 2.2315]],
device='cuda:0', grad_fn=<AliasBackward0>)