Adaptive Rounding (AdaRound)¶

This notebook contains a working example of AIMET adaptive rounding (AdaRound).

AIMET quantization features typically use the “nearest rounding” technique for achieving quantization. When using the nearest rounding technique, the weight value is quantized to the nearest integer value.

AdaRound optimizes a loss function using unlabeled training data to decide whether to quantize a specific weight to the closer integer value or the farther one. Using AdaRound, quantized accuracy is closer to the FP32 model than with nearest rounding.

Overall flow¶

The example follows these high-level steps:

Instantiate the example evaluation and training pipeline
Load the FP32 model and evaluate the model to find the baseline FP32 accuracy
Create a quantization simulation model (with fake quantization ops) and evaluate the quantized simuation model
Apply AdaRound and evaluate the simulation model to get a post-finetuned quantized accuracy score

Note

This notebook does not show state-of-the-art results. For example, it uses a relatively quantization-friendly model (Resnet18). Also, some optimization parameters like number of fine-tuning epochs are chosen to improve execution speed in the notebook.

Dataset¶

This example does image classification on the ImageNet dataset. If you already have a version of the data set, use that. Otherwise download the data set, for example from https://image-net.org/challenges/LSVRC/2012/index .

Note

To speed up the execution of this notebook, you can use a reduced subset of the ImageNet dataset. For example: The entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. However, for the purpose of running this notebook, you can reduce the dataset to, say, two samples per class.

Edit the cell below to specify the directory where the downloaded ImageNet dataset is saved.

[ ]:

DATASET_DIR = '/path/to/dataset/'         # Replace this path with a real directory

1. Instantiate the example training and validation pipeline¶

Use the following training and validation loop for the image classification task.

Things to note:

AIMET does not put limitations on how the evaluation pipeline is written. AIMET creates an onnxruntime.InferenceSession for the quantized model, which can be run like a regular InferenceSession. sim.session can be used in place of the any other InferenceSession when doing inference/evaluation.

[ ]:

import os
from tqdm import tqdm
import torchvision
from torchvision import transforms
import torch
import onnxruntime as ort
import numpy as np

BATCH_SIZE = 32
NUM_CALIBRATION_SAMPLES = 256
NUM_EVAL_SAMPLES = 50000

preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)

imagenet_data = torchvision.datasets.ImageNet(DATASET_DIR,
                                              split="val",
                                              transform=preprocess)

dataloader = torch.utils.data.DataLoader(imagenet_data,
                                         batch_size=BATCH_SIZE,
                                         shuffle=True,
                                         num_workers=4)

def evaluate(session: ort.InferenceSession):
    correct_predictions = 0
    total_samples = 0
    for inputs, labels in tqdm(dataloader):
        inputs, labels = inputs.numpy(), labels.numpy()
        input_name = session.get_inputs()[0].name
        pred_probs, *_ = session.run(None, {input_name: inputs})
        pred_labels = np.argmax(pred_probs, axis=1)
        correct_predictions += np.sum(pred_labels == labels)
        total_samples += labels.shape[0]
    return correct_predictions / total_samples

2. Convert an FP32 PyTorch model to ONNX, simplify & then evaluate baseline FP32 accuracy¶

2.1 Export a pretrained resnet18 model to onnx

You can load any pretrained PyTorch model instead.

[ ]:

from torchvision.models import resnet18
import onnx

input_shape = (1, 3, 224, 224)    # Shape for each ImageNet sample is (3 channels) x (224 height) x (224 width)
dummy_input = torch.randn(input_shape)
filename = "./resnet18.onnx"

# Load a pretrained ResNet-18 model in torch
pt_model = resnet18(pretrained=True)

# Export the torch model to onnx
torch.onnx.export(pt_model.eval(),
                  dummy_input,
                  filename,
                  input_names=['input'],
                  output_names=['output'],
                  dynamic_axes={
                      'input' : {0 : 'batch_size'},
                      'output' : {0 : 'batch_size'},
                  }
                  )

model = onnx.load_model(filename)

2.2 (Optional) Simplify the onnx model

It is recommended to simplify the model before using AIMET as it can improve quantized accuracy and runtime performance.

[ ]:

from onnxsim import simplify
model, _ = simplify(model)

2.3 Decide whether to place the model on a CPU or CUDA device

This example uses CUDA if it is available. You can change this logic and force a device placement if needed.

[ ]:

# cudnn_conv_algo_search is fixing it to default to avoid changing in accuracies/outputs at every inference
if 'CUDAExecutionProvider' in ort.get_available_providers():
    providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
else:
    providers = ['CPUExecutionProvider']

2.4 Create an InferenceSession and determine the model’s FP32 accuracy

[ ]:

sess = ort.InferenceSession(model.SerializeToString(), providers=providers)
accuracy = evaluate(sess)
print(f"FP32 model accuracy: {accuracy}")

3. Create a quantization simulation model and determine quantized accuracy¶

3.1 Fold BatchNormalization layers

Before calculating the simulated quantized accuracy using QuantizationSimModel, fold the BatchNormalization (BN) layers into adjacent Convolutional layers. The BN layers that cannot be folded are left as they are.

BN folding improves inference performance on quantized runtimes but can degrade accuracy on these platforms. This step simulates this on-target drop in accuracy.

Use the following code to call AIMET to fold the BN layers in-place on the given model

[ ]:

from aimet_onnx.batch_norm_fold import fold_all_batch_norms_to_weight

fold_all_batch_norms_to_weight(model)

3.2 Create a QuantizationSimModel

In this step, AIMET inserts fake quantization ops in the model graph and configures them.

Key parameters:

Setting activation_type to int8 performs all activation quantizations in the model using integer 8-bit precision
Setting param_type to int8 performs all parameter quantizations in the model using integer 8-bit precision

See QuantizationSimModel in the AIMET API documentation for a full explanation of the parameters.

[ ]:

import copy
import aimet_onnx
from aimet_common.defs import QuantScheme
from aimet_onnx.quantsim import QuantizationSimModel

sim = QuantizationSimModel(model=copy.deepcopy(model),
                           quant_scheme=QuantScheme.min_max,
                           param_type=aimet_onnx.int8,
                           activation_type=aimet_onnx.int8,
                           providers=providers)

AIMET has added quantizer nodes to the model graph, but before the sim model can be used for inference or training, scale and offset quantization parameters must be calculated for each quantizer node by passing unlabeled data samples through the model to collect range statistics. This process is sometimes referred to as calibration. AIMET refers to it as “computing encodings”.

3.3 Pass unlabeled data samples through the model

The following code is one way get unlabeled samples for calibration. It uses the existing pytorch train or validation data loader and converts samples to an onnxruntime-compatible format.

[ ]:

import itertools
input_name = model.graph.input[0].name
num_batches = NUM_CALIBRATION_SAMPLES // BATCH_SIZE
onnx_data = [{input_name: data.numpy()} for data, labels in itertools.islice(dataloader, num_batches)]

sim.compute_encodings(onnx_data)

A few notes regarding the data samples:

A very small percentage of the data samples are needed. For example, the training dataset for ImageNet has 1M samples; 500 or 1000 suffice to compute encodings.
The samples should be reasonably well distributed. While it’s not necessary to cover all classes, avoid extreme scenarios like using only dark or only light samples. That is, using only pictures captured at night, say, could skew the results.

3.4 Evaluate the quantized model

You can pass sim.session to the eval function to evaluate the quantsim model.

[ ]:

# Evaluate the pre-adaround model
accuracy = evaluate(sim.session)
print(f"Pre-adaround sim accuracy {accuracy}")

4. Apply Adaround¶

4.1 Run adaround optimization

Some key parameters:

inputs: is a collection (e.g., List[Dict[str, np.ndarray]]) of InferenceSession inputs for the model. Adaround needs a dataloader in order to use data samples to learn the rounding vectors.
iterations: is the number of iterations to apply to each layer. Default value is 10000, and we strongly recommend using at least this number. This example uses 32 to speed up execution.

[ ]:

# Apply adaround to the model weights
aimet_onnx.apply_adaround(sim, onnx_data, iterations=32)

4.2 Recompute activation encodings

Because adarounded weights may impact the distribution of activations in the model, it is recommended to recompute activation encodings after applying adaround.

[ ]:

# Recompute activation encodings (weight encodings are frozen)
sim.compute_encodings(onnx_data)

4.3 Evaluate the optimized sim

[ ]:

# Evaluate the ada-rounded model
accuracy = evaluate(sim.session)
print(f"Post-adaround sim accuracy: {accuracy}")

There might be little gain in accuracy after this limited application of Adaround. Experiment with the hyper-parameters to get better results.

Next steps¶

Export the model and encodings.

Export the model with the updated weights but without the fake quant ops.
Export the encodings (scale and offset quantization parameters). AIMET QuantizationSimModel provides an export API for this purpose.

The following code performs these exports.

[ ]:

sim.export(path='.', filename_prefix='resnet18_after_adaround')

For more information¶

See the AIMET API docs for details about the AIMET APIs and optional parameters.

See the other example notebooks to learn how to use other AIMET post-training quantization techniques.