AutoQuant
This notebook shows a working code example of how to use AIMET AutoQuant feature.
AIMET offers a suite of neural network post-training quantization techniques. Often, applying these techniques in a specific sequence, results in better accuracy and performance. Without the AutoQuant feature, the AIMET user needs to manually try out various combinations of AIMET quantization features. This manual process is error-prone and often time-consuming.
The AutoQuant feature, analyzes the model, determines the sequence of AIMET quantization techniques and applies these techniques. In addition, the user can specify the amount of accuracy drop that can be tolerated, in the AutoQuant API. As soon as this threshold accuracy is reached, AutoQuant stops applying any additional quantization technique. In summary, the AutoQuant feature saves time and automates the quantization of the neural networks.
Overall flow
This notebook covers the following 1. Instantiate the example evaluation and training pipeline 2. Load a pretrained FP32 model 3. Determine the baseline FP32 accuracy 4. Define constants and helper functions 5. Apply AutoQuant
What this notebook is not
This notebook is not designed to show state-of-the-art AutoQuant results. For example, it uses a relatively quantization-friendly model like Resnet50. Also, some optimization parameters are deliberately chosen to have the notebook execute more quickly.
Dataset
This notebook relies on the ImageNet dataset for the task of image classification. If you already have a version of the dataset readily available, please use that. Else, please download the dataset from appropriate location (e.g. https://image-net.org/challenges/LSVRC/2012/index.php#)
Note: To speed up the execution of this notebook, you may use a reduced subset of the ImageNet dataset. E.g. the entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. But for the purpose of running this notebook, you could perhaps reduce the dataset to say 2 samples per class. This exercise is left upto the reader and is not necessary.
Edit the cell below and specify the directory where the downloaded ImageNet dataset is saved.
[ ]:
DATASET_DIR = '/path/to/dir/' # Please replace this with a real directory
[ ]:
import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'
import tensorflow as tf
from aimet_tensorflow.keras.auto_quant import AutoQuant
1. Example evaluation and training pipeline
The following is an example training and validation loop for this image classification task.
Does AIMET have any limitations on how the training, validation pipeline is written? Not really. We will see later that AIMET will modify the user’s model to create a QuantizationSim model which is still a TensorFlow model. This QuantizationSim model can be used in place of the original model when doing inference or training.
Does AIMET put any limitation on the interface of evaluate() or train() methods? Not really. You should be able to use your existing evaluate and train routines as-is.
[ ]:
from typing import Optional
from Examples.common import image_net_config
from Examples.tensorflow.utils.keras.image_net_dataset import ImageNetDataset
from Examples.tensorflow.utils.keras.image_net_evaluator import ImageNetEvaluator
class ImageNetDataPipeline:
"""
Provides APIs for model evaluation and finetuning using ImageNet Dataset.
"""
@staticmethod
def get_val_dataset(batch_size: Optional[int] = None) -> tf.data.Dataset:
"""
Instantiates a validation dataloader for ImageNet dataset and returns it
:return: A tensorflow dataset
"""
if batch_size is None:
batch_size = image_net_config.evaluation['batch_size']
data_loader = ImageNetDataset(DATASET_DIR,
image_size=image_net_config.dataset['image_size'],
batch_size=batch_size)
return data_loader
@staticmethod
def evaluate(model, iterations=None) -> float:
"""
Given a Keras model, evaluates its Top-1 accuracy on the validation dataset
:param model: The Keras model to be evaluated.
:param iterations: The number of iterations to run. If None, all the data will be used
:return: The accuracy for the sample with the maximum accuracy.
"""
evaluator = ImageNetEvaluator(DATASET_DIR,
image_size=image_net_config.dataset["image_size"],
batch_size=image_net_config.evaluation["batch_size"])
return evaluator.evaluate(model=model, iterations=iterations)
2. Load a pretrained FP32 model
For this example notebook, we are going to load a pretrained ResNet50 model from Keras. Similarly, you can load any pretrained Keras model instead.
[ ]:
from tensorflow.keras.applications.resnet import ResNet50
model = ResNet50(weights='imagenet')
3. Determine the baseline FP32 accuracy
Let’s determine the FP32 (floating point 32-bit) accuracy of this model using evaluate() routine
[ ]:
ImageNetDataPipeline.evaluate(model=model)
4. Define Constants and Helper functions
In this section the constants and helper functions needed to run this example are defined.
EVAL_DATASET_SIZE A typical value is 5000. To execute this example faster this value has been set to 50
CALIBRATION_DATASET_SIZE A typical value is 2000. To execute this example faster this value has been set to 20
BATCH_SIZE User sets the batch size. As an example, set to 10
The helper function **_create_sampled_data_loader()** returns a DataLoader based on the dataset and the number of samples provided.
[ ]:
EVAL_DATASET_SIZE = 50
CALIBRATION_DATASET_SIZE = 20
BATCH_SIZE = 10
[ ]:
eval_dataset = ImageNetDataPipeline.get_val_dataset(BATCH_SIZE).dataset
unlabeled_dataset = eval_dataset.map(lambda images, labels: images)
Prepare the evaluation callback function
The eval_callback() function takes the model object to evaluate and compile option dictionary and the number of samples to use as arguments. If the num_samples argument is None, the whole evaluation dataset is used to evaluate the model.
[ ]:
from typing import Optional
def eval_callback(model: tf.keras.Model,
num_samples: Optional[int] = None) -> float:
if num_samples is None:
num_samples = EVAL_DATASET_SIZE
sampled_dataset = eval_dataset.take(num_samples)
# Model should be compiled before evaluation
model.compile(optimizer=tf.keras.optimizers.Adam(),
loss=tf.keras.losses.CategoricalCrossentropy(),
metrics=tf.keras.metrics.CategoricalAccuracy())
_, acc = model.evaluate(sampled_dataset)
return acc
5. Apply AutoQuant
As a first step, the AutoQuant object is created.
The allowed_accuracy_drop parameter is set by the user to convey to the AutoQuant feature, how much accuracy drop is tolerated by the user. AutoQuant applies a series of quantization features. When the allowed accuracy is reached, AutoQuant stops applying any subsequent quantization feature. Please refer AutoQuant User Guide and API documentation for complete details.
[ ]:
auto_quant = AutoQuant(allowed_accuracy_drop=0.01,
unlabeled_dataset=unlabeled_dataset,
eval_callback=eval_callback)
Optionally set AdaRound Parameters
The AutoQuant feature internally uses default parameters to execute the AdaRound step. If and only if necessary, the default AdaRound Parameters should be modified using the API shown below.
Note: To execute this example faster, the default value of the num_iterations parameter has been reduced from 10000 to 2000
[ ]:
from aimet_tensorflow.adaround.adaround_weight import AdaroundParameters
ADAROUND_DATASET_SIZE = 2000
adaround_dataset = unlabeled_dataset.take(ADAROUND_DATASET_SIZE)
adaround_params = AdaroundParameters(adaround_dataset,
num_batches=ADAROUND_DATASET_SIZE // BATCH_SIZE)
auto_quant.set_adaround_params(adaround_params)
Run AutoQuant
This step applies the AutoQuant feature. The best possible quantized model, the associated eval_score and the path to the AdaRound encoding files are returned.
[ ]:
model, accuracy, encoding_path = auto_quant.apply(model)
Summary
Hope this notebook was useful for you to understand how to use AIMET AutoQuant feature.
Few additional resources - Refer to the AIMET API docs to know more details of the APIs and parameters - Refer to the other example notebooks to understand how to use AIMET CLE and AdaRound features in a standalone fashion.