Quantization-Aware Training with BatchNorm Re-estimation

This notebook shows a working code example of how to use AIMET to perform QAT (Quantization-aware training) with batchnorm re-estimation. Batchnorm re-estimation is a technique for countering potential instability of batchnrom statistics (i.e. running mean and variance) during QAT. More specifically, batchnorm re-estimation recalculates the batchnorm statistics based on the model after QAT. By doing so, we aim to make our model learn batchnorm statistics from from stable outputs after QAT, rather than from likely noisy outputs during QAT.

Overall flow

This notebook covers the following steps: 1. Create a quantization simulation model with fake quantization ops inserted. 2. Finetune and evaluate the quantization simulation model 3. Re-estimate batchnorm statistics and compare the eval score before and after re-estimation. 4. Fold the re-estimated batchnorm layers and export the quantization simulation model

What this notebook is not

In this notebook, we will focus how to apply batchnorm re-estimation after QAT, rather than covering all the details about QAT itself. For more information about QAT, please refer to QAT notebook or QAT range learning notebook.

Dataset

This notebook relies on the ImageNet dataset for the task of image classification. If you already have a version of the dataset readily available, please use that. Else, please download the dataset from appropriate location (e.g. https://image-net.org/challenges/LSVRC/2012/index.php#).

Note1: The ImageNet dataset typically has the following characteristics and the dataloader provided in this example notebook rely on these - Subfolders ‘train’ for the training samples and ‘val’ for the validation samples. Please see the pytorch dataset description for more details. - A subdirectory per class, and a file per each image sample

Note2: To speed up the execution of this notebook, you may use a reduced subset of the ImageNet dataset. E.g. the entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. But for the purpose of running this notebook, you could perhaps reduce the dataset to say 2 samples per class. This exercise is left upto the reader and is not necessary.

Edit the cell below and specify the directory where the downloaded ImageNet dataset is saved.

[ ]:

DATASET_DIR = '/path/to/dataset/'         # Please replace this with a real directory

1. Example evaluation and training pipeline

The following is an example training and validation loop for this image classification task.

Does AIMET have any limitations on how the training, validation pipeline is written? Not really. We will see later that AIMET will modify the user’s model to create a QuantizationSim model which is still a PyTorch model. This QuantizationSim model can be used in place of the original model when doing inference or training.
Does AIMET put any limitation on the interface of the evaluate() or train() methods? Not really. You should be able to use your existing evaluate and train routines as-is.

[ ]:

import os
import torch
from Examples.common import image_net_config
from Examples.torch.utils.image_net_evaluator import ImageNetEvaluator
from Examples.torch.utils.image_net_trainer import ImageNetTrainer
from Examples.torch.utils.image_net_data_loader import ImageNetDataLoader

class ImageNetDataPipeline:

    @staticmethod
    def get_val_dataloader() -> torch.utils.data.DataLoader:
        """
        Instantiates a validation dataloader for ImageNet dataset and returns it
        """
        data_loader = ImageNetDataLoader(DATASET_DIR,
                                         image_size=image_net_config.dataset['image_size'],
                                         batch_size=image_net_config.evaluation['batch_size'],
                                         is_training=False,
                                         num_workers=image_net_config.evaluation['num_workers']).data_loader
        return data_loader

    @staticmethod
    def evaluate(model: torch.nn.Module, use_cuda: bool) -> float:
        """
        Given a torch model, evaluates its Top-1 accuracy on the dataset
        :param model: the model to evaluate
        :param use_cuda: whether or not the GPU should be used.
        """
        evaluator = ImageNetEvaluator(DATASET_DIR, image_size=image_net_config.dataset['image_size'],
                                      batch_size=image_net_config.evaluation['batch_size'],
                                      num_workers=image_net_config.evaluation['num_workers'])

        return evaluator.evaluate(model, iterations=None, use_cuda=use_cuda)

    @staticmethod
    def finetune(model: torch.nn.Module, epochs, learning_rate, learning_rate_schedule, use_cuda):
        """
        Given a torch model, finetunes the model to improve its accuracy
        :param model: the model to finetune
        :param epochs: The number of epochs used during the finetuning step.
        :param learning_rate: The learning rate used during the finetuning step.
        :param learning_rate_schedule: The learning rate schedule used during the finetuning step.
        :param use_cuda: whether or not the GPU should be used.
        """
        trainer = ImageNetTrainer(DATASET_DIR, image_size=image_net_config.dataset['image_size'],
                                  batch_size=image_net_config.train['batch_size'],
                                  num_workers=image_net_config.train['num_workers'])

        trainer.train(model, max_epochs=epochs, learning_rate=learning_rate,
                      learning_rate_schedule=learning_rate_schedule, use_cuda=use_cuda)

2. Load FP32 model

For this example notebook, we are going to load a pretrained resnet18 model from torchvision. Similarly, you can load any pretrained PyTorch model instead.

[ ]:

from torchvision.models import resnet18
from aimet_torch.model_preparer import prepare_model

use_cuda = torch.cuda.is_available()
if use_cuda:
    device = torch.device("cuda")
else:
    device = torch.device("cpu")

model = resnet18(pretrained=True).to(device)
model = prepare_model(model)

3. Create a quantization simulation model and Perform QAT

Create Quantization Sim Model

Now we use AIMET to create a QuantizationSimModel. This basically means that AIMET will insert fake quantization ops in the model graph and will configure them. A few of the parameters are explained here - quant_scheme: We set this to “QuantScheme.post_training_tf_enhanced” - Supported options are ‘tf_enhanced’ or ‘tf’ or using Quant Scheme Enum QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced - default_output_bw: Setting this to 8, essentially means that we are asking AIMET to perform all activation quantizations in the model using integer 8-bit precision - default_param_bw: Setting this to 8, essentially means that we are asking AIMET to perform all parameter quantizations in the model using integer 8-bit precision

There are other parameters that are set to default values in this example. Please check the AIMET API documentation of QuantizationSimModel to see reference documentation for all the parameters.

NOTE: Note that, unlike in other QAT example scripts, we didn’t fold batchnorm layers before QAT. This is because we aim to finetune our model with batchnorm layers present and re-estimate the batchnorm statatistics for better accuracy. The batchnorm layers will be folded after re-estimation.

[ ]:

from aimet_common.defs import QuantScheme
from aimet_torch.v1.quantsim import QuantizationSimModel

dummy_input = torch.rand(1, 3, 224, 224, device=device)    # Shape for each ImageNet sample is (3 channels) x (224 height) x (224 width)

sim = QuantizationSimModel(model=model,
                           quant_scheme=QuantScheme.training_range_learning_with_tf_init,
                           dummy_input=dummy_input,
                           default_output_bw=8,
                           default_param_bw=8)

def pass_calibration_data(sim_model, use_cuda):
    data_loader = ImageNetDataPipeline.get_val_dataloader()
    batch_size = data_loader.batch_size

    if use_cuda:
        device = torch.device('cuda')
    else:
        device = torch.device('cpu')

    samples = 1000
    batch_cntr = 0

    for input_data, target_data in data_loader:
        inputs_batch = input_data.to(device)
        sim_model(inputs_batch)

        batch_cntr += 1
        if (batch_cntr * batch_size) > samples:
            break

sim.compute_encodings(forward_pass_callback=pass_calibration_data,
                      forward_pass_callback_args=use_cuda)

Perform QAT

To perform quantization aware training (QAT), we simply train the model for a few more epochs (typically 15-20). As with any training job, hyper-parameters need to be searched for optimal results. Good starting points are to use a learning rate on the same order as the ending learning rate when training the original model, and to drop the learning rate by a factor of 10 every 5 epochs or so.

For the purpose of this example notebook, we are going to train only for 1 epoch. But feel free to change these parameters as you see fit.

[ ]:

ImageNetDataPipeline.finetune(sim.model, epochs=1, learning_rate=5e-7, learning_rate_schedule=[5, 10], use_cuda=use_cuda)

After we are done with QAT, we can run quantization simulation inference against the validation dataset at the end to observe any improvements in accuracy.

[ ]:

finetuned_accuracy = ImageNetDataPipeline.evaluate(sim.model, use_cuda)
print(finetuned_accuracy)

4. Perform BatchNorm Reestimation

Re-estimate BatchNorm Statistics

AIMET provides a helper function, reestimate_bn_stats, for re-estimating batchnorm statistics. Here is the full list of parameters for this function: * model: Model to re-estimate the BatchNorm statistics. * dataloader Train dataloader. * num_batches (optional): The number of batches to be used for reestimation. (Default: 100) * forward_fn (optional): Optional adapter function that performs forward pass given a model and a input batch yielded from the data loader. If not specified, it is expected that inputs yielded from dataloader can be passed directly to the model.

[ ]:

from aimet_torch.bn_reestimation import reestimate_bn_stats

train_loader = ImageNetDataLoader(images_dir=DATASET_DIR,
                                  image_size=image_net_config.dataset['image_size'],
                                  batch_size=image_net_config.train['batch_size'],
                                  is_training=True,
                                  num_workers=image_net_config.train['num_workers']).data_loader
def forward_fn(model, inputs):
    input_data, target_data = inputs
    model(input_data)

reestimate_bn_stats(sim.model, train_loader, forward_fn=forward_fn)

finetuned_accuracy_bn_reestimated = ImageNetDataPipeline.evaluate(sim.model, use_cuda)
print(finetuned_accuracy_bn_reestimated)

Fold BatchNorm Layers

So far, we have improved our quantization simulation model through QAT and batchnorm re-estimation. The next step would be to actually take this model to target. But first, we should fold the batchnorm layers for our model to run on target devices more efficiently.

[ ]:

from aimet_torch.batch_norm_fold import fold_all_batch_norms_to_scale

fold_all_batch_norms_to_scale(sim)

5. Export Model

As the final step, we will export the model to run it on actual target devices. AIMET QuantizationSimModel provides an export API for this purpose.

[ ]:

os.makedirs('./output/', exist_ok=True)
dummy_input = dummy_input.cpu()
sim.export(path='./output/', filename_prefix='resnet18_after_qat', dummy_input=dummy_input)

Summary

Hope this notebook was useful for you to understand how to use batchnorm re-estimation feature of AIMET.

Few additional resources - Refer to the AIMET API docs to know more details of the APIs and optional parameters. - Refer to the other example notebooks to understand how to use AIMET post-training quantization techniques and QAT methods.