Quantization-Aware Training with BatchNorm Re-estimation

This notebook shows a working code example of how to use AIMET to perform QAT (Quantization-aware training) with batchnorm re-estimation. Batchnorm re-estimation is a technique for countering potential instability of batchnorm statistics (i.e. running mean and variance) during QAT. More specifically, batchnorm re-estimation recalculates the batchnorm statistics based on the model after QAT. By doing so, we aim to make our model learn batchnorm statistics from stable outputs after QAT, rather than from likely noisy outputs during QAT.

Overall flow

This notebook covers the following steps: 1. Instantiate the example evaluation and training pipeline 2. Define Constants and Datasets Prepare 3. Create the model in Keras 4. Train and evaluate the model 5. Quantize the model with QuantSim 6. Finetune and evaluate the quantization simulation model 7. Re-estimate batchnorm statistics and compare the eval score before and after re-estimation 8. Fold the re-estimated batchnorm layers and export the quantization simulation model


Dataset

This notebook relies on the ImageNet dataset for the task of image classification. If you already have a version of the dataset readily available, please use that. Else, please download the dataset from appropriate location (e.g. https://image-net.org/challenges/LSVRC/2012/index.php#)

Note: To speed up the execution of this notebook, you may use a reduced subset of the ImageNet dataset. E.g. the entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. But for the purpose of running this notebook, you could perhaps reduce the dataset to say 2 samples per class. This exercise is left upto the reader and is not necessary.

Edit the cell below and specify the directory where the downloaded ImageNet dataset is saved.

[ ]:
DATASET_DIR = '/path/to/dir/'       # Please replace this with a real directory
[ ]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

import tensorflow as tf

1. Instantiate the example evaluation and training pipeline

The following is an example training and validation loop for this image classification task.

  • Does AIMET have any limitations on how the training, validation pipeline is written? Not really. We will see later that AIMET will modify the user’s model to create a QuantizationSim model which is still a TensorFlow model. This QuantizationSim model can be used in place of the original model when doing inference or training.

  • Does AIMET put any limitation on the interface of evaluate() or train() methods? Not really. You should be able to use your existing evaluate and train routines as-is.

[ ]:
from typing import Optional
from Examples.common import image_net_config
from Examples.tensorflow.utils.keras.image_net_dataset import ImageNetDataset
from Examples.tensorflow.utils.keras.image_net_evaluator import ImageNetEvaluator


class ImageNetDataPipeline:
    """
    Provides APIs for model evaluation and finetuning using ImageNet Dataset.
    """

    @staticmethod
    def get_val_dataset(batch_size: Optional[int] = None) -> tf.data.Dataset:
        """
        Instantiates a validation dataloader for ImageNet dataset and returns it
        :return: A tensorflow dataset
        """
        if batch_size is None:
            batch_size = image_net_config.evaluation['batch_size']

        data_loader = ImageNetDataset(DATASET_DIR,
                                      image_size=image_net_config.dataset['image_size'],
                                      batch_size=batch_size)

        return data_loader

    @staticmethod
    def evaluate(model, iterations=None) -> float:
        """
        Given a Keras model, evaluates its Top-1 accuracy on the validation dataset
        :param model: The Keras model to be evaluated.
        :param iterations: The number of iterations to run. If None, all the data will be used
        :return: The accuracy for the sample with the maximum accuracy.
        """
        evaluator = ImageNetEvaluator(DATASET_DIR,
                                      image_size=image_net_config.dataset["image_size"],
                                      batch_size=image_net_config.evaluation["batch_size"])

        return evaluator.evaluate(model=model, iterations=iterations)

2. Define Constants and Datasets Prepare

In this section the constants and helper functions needed to run this example are defined.

  • EVAL_DATASET_SIZE To execute this example faster this value has been set to 4

  • TRAIN_DATASET_SIZE To execute this example faster this value has been set to 4

  • RE_ESTIMATION_DATASET_SIZE To execute this example faster this value has been set to 4

  • BATCH_SIZE User sets the batch size. As an example, set to 16

[ ]:
EVAL_DATASET_SIZE = 4
TRAIN_DATASET_SIZE = 4
RE_ESTIMATION_DATASET_SIZE = 4
BATCH_SIZE = 16

dataset = ImageNetDataPipeline.get_val_dataset(BATCH_SIZE).dataset
eval_dataset = dataset.take(EVAL_DATASET_SIZE)
train_dataset = dataset.take(TRAIN_DATASET_SIZE)
unlabeled_dataset = dataset.map(lambda images, labels: images)
re_estimation_dataset = unlabeled_dataset.take(RE_ESTIMATION_DATASET_SIZE)

2. Create the model in Keras

Currently, only Keras models built using the Sequential or Functional APIs are compatible with QuantSim - models making use of subclassed layers are incompatible. Therefore, we use the Functional API to create the model used in this example

[ ]:
tf.keras.backend.clear_session()
inputs = tf.keras.Input(shape=(224, 224, 3), name="inputs")
conv = tf.keras.layers.Conv2D(16, (3, 3), name ='conv1')(inputs)
bn = tf.keras.layers.BatchNormalization(fused=True)(conv)
relu = tf.keras.layers.ReLU()(bn)
pool = tf.keras.layers.MaxPooling2D()(relu)
conv2 = tf.keras.layers.Conv2D(8, (3, 3), name ='conv2')(pool)
flatten = tf.keras.layers.Flatten()(conv2)
dense  = tf.keras.layers.Dense(1000)(flatten)
functional_model = tf.keras.Model(inputs=inputs, outputs=dense)

3. Train and evaluate the model

Before we can quantize the model and apply QAT, the FP32 model must be trained so that we can get a baseline accuracy.

[ ]:
loss_fn = tf.keras.losses.CategoricalCrossentropy()

functional_model.compile(optimizer='adam',
              loss=loss_fn,
              metrics=['accuracy'])

functional_model.fit(train_dataset, epochs=5)

# Evaluate the model on the test data using `evaluate`
print("Evaluate quantized model (post QAT) on test data")
ImageNetDataPipeline.evaluate(model=functional_model)

4. Create a QuantizationSim Model

Now we use AIMET to create a QuantizationSimModel. This basically means that AIMET will insert fake quantization ops in the model graph and will configure them. A few of the parameters are explained here - quant_scheme: We set this to “QuantScheme.training_range_learning_with_tf_init” - Supported options are ‘tf_enhanced’ or ‘tf’ or using Quant Scheme Enum QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced - default_output_bw: Setting this to 8, essentially means that we are asking AIMET to perform all activation quantizations in the model using integer 8-bit precision - default_param_bw: Setting this to 8, essentially means that we are asking AIMET to perform all parameter quantizations in the model using integer 8-bit precision

There are other parameters that are set to default values in this example. Please check the AIMET API documentation of QuantizationSimModel to see reference documentation for all the parameters.

[ ]:
import json
from aimet_common.defs import QuantScheme
from aimet_tensorflow.keras.quantsim import QuantizationSimModel

default_config_per_channel = {
        "defaults":
            {
                "ops":
                    {
                        "is_output_quantized": "True"
                    },
                "params":
                    {
                        "is_quantized": "True",
                        "is_symmetric": "True"
                    },
                "strict_symmetric": "False",
                "unsigned_symmetric": "True",
                "per_channel_quantization": "True"
            },

        "params":
            {
                "bias":
                    {
                        "is_quantized": "False"
                    }
            },

        "op_type":
            {
                "Squeeze":
                    {
                        "is_output_quantized": "False"
                    },
                "Pad":
                    {
                        "is_output_quantized": "False"
                    },
                "Mean":
                    {
                        "is_output_quantized": "False"
                    }
            },

        "supergroups":
            [
                {
                    "op_list": ["Conv", "Relu"]
                },
                {
                    "op_list": ["Conv", "Clip"]
                },
                {
                    "op_list": ["Conv", "BatchNormalization", "Relu"]
                },
                {
                    "op_list": ["Add", "Relu"]
                },
                {
                    "op_list": ["Gemm", "Relu"]
                }
            ],

        "model_input":
            {
                "is_input_quantized": "True"
            },

        "model_output":
            {}
    }

with open("/tmp/default_config_per_channel.json", "w") as f:
    json.dump(default_config_per_channel, f)


qsim = QuantizationSimModel(functional_model, quant_scheme=QuantScheme.training_range_learning_with_tf_init,
                                config_file="/tmp/default_config_per_channel.json")

Prepare the evaluation callback function

The eval_callback() function takes the model object to evaluate and compile option dictionary and the number of samples to use as arguments. If the num_samples argument is None, the whole evaluation dataset is used to evaluate the model.

[ ]:
from typing import Optional


def eval_callback(model: tf.keras.Model,
                  num_samples: Optional[int] = None) -> float:
    if num_samples is None:
        num_samples = EVAL_DATASET_SIZE

    sampled_dataset = eval_dataset.take(num_samples)

    # Model should be compiled before evaluation
    model.compile(optimizer=tf.keras.optimizers.Adam(),
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=tf.keras.metrics.CategoricalAccuracy())
    _, acc = model.evaluate(sampled_dataset)

    return acc

Compute Encodings

Even though AIMET has added ‘quantizer’ nodes to the model graph but the model is not ready to be used yet. Before we can use the sim model for inference or training, we need to find appropriate scale/offset quantization parameters for each ‘quantizer’ node. For activation quantization nodes, we need to pass unlabeled data samples through the model to collect range statistics which will then let AIMET calculate appropriate scale/offset quantization parameters. This process is sometimes referred to as calibration. AIMET simply refers to it as ‘computing encodings’.

So we create a routine to pass unlabeled data samples through the model. This should be fairly simple - use the existing train or validation data loader to extract some samples and pass them to the model. We don’t need to compute any loss metric etc. So we can just ignore the model output for this purpose. A few pointers regarding the data samples - In practice, we need a very small percentage of the overall data samples for computing encodings. - It may be beneficial if the samples used for computing encoding are well distributed. It’s not necessary that all classes need to be covered etc. since we are only looking at the range of values at every layer activation. However, we definitely want to avoid an extreme scenario like all positive or negative samples are used.

The following shows an example of a routine that passes unlabeled samples through the model for computing encodings. This routine can be written in many different ways, this is just an example.

[ ]:
qsim.compute_encodings(eval_callback, forward_pass_callback_args=None)

Next, we can evaluate the performance of the quantized model

[ ]:
print("Evaluate quantized model on test data")
ImageNetDataPipeline.evaluate(model=qsim.model)

5. Perform QAT

To perform quantization aware training (QAT), we simply train the model for a few more epochs (typically 15-20). As with any training job, hyper-parameters need to be searched for optimal results. Good starting points are to use a learning rate on the same order as the ending learning rate when training the original model, and to drop the learning rate by a factor of 10 every 5 epochs or so. For the purpose of this example notebook, we are going to train only for 1 epoch. But feel free to change these parameters as you see fit.

[ ]:
quantized_callback = tf.keras.callbacks.TensorBoard(log_dir="./log/quantized")
history = qsim.model.fit(
    train_dataset, batch_size=4, epochs=1, validation_data=eval_dataset,
    callbacks=[quantized_callback]
)

Finally, let’s evaluate the validation accuracy of our model after QAT.

[ ]:
print("Evaluate quantized model (post QAT) on test data")
ImageNetDataPipeline.evaluate(model=qsim.model)

6. Re-estimate BatchNorm Statistics

AIMET provides a helper function, reestimate_bn_stats, for re-estimating batchnorm statistics. Here is the full list of parameters for this function: * model: Model to re-estimate the BatchNorm statistics. * dataloader Train dataloader. * num_batches (optional): The number of batches to be used for reestimation. (Default: 100) * forward_fn (optional): Optional adapter function that performs forward pass given a model and a input batch yielded from the data loader. If not specified, it is expected that inputs yielded from dataloader can be passed directly to the model.

[ ]:
from aimet_tensorflow.keras.bn_reestimation import reestimate_bn_stats

reestimate_bn_stats(qsim.model, re_estimation_dataset, 1)

Fold BatchNorm Layers

So far, we have improved our quantization simulation model through QAT and batchnorm re-estimation. The next step would be to actually take this model to target. But first, we should fold the batchnorm layers for our model to run on target devices more efficiently.

[ ]:
from aimet_tensorflow.keras.batch_norm_fold import fold_all_batch_norms_to_scale
fold_all_batch_norms_to_scale(qsim)

5. Export Model

As the final step, we will export the model to run it on actual target devices. AIMET QuantizationSimModel provides an export API for this purpose.

[ ]:
import os
os.makedirs('./output/', exist_ok=True)
qsim.export(path='./output/', filename_prefix='mnist_after_bn_re_estimation_qat_range_learning')

Summary

Hope this notebook was useful for you to understand how to use batchnorm re-estimation feature of AIMET.

Few additional resources - Refer to the AIMET API docs to know more details of the APIs and optional parameters. - Refer to the other example notebooks to understand how to use AIMET post-training quantization techniques and QAT methods.