Adaptive Rounding (AdaRound)
This notebook contains a working example of AIMET adaptive rounding (AdaRound).
AIMET quantization features typically use the “nearest rounding” technique for achieving quantization. When using the nearest rounding technique, the weight value is quantized to the nearest integer value.
AdaRound optimizes a loss function using unlabeled training data to decide whether to quantize a specific weight to the closer integer value or the farther one. Using AdaRound, quantized accuracy is closer to the FP32 model than with nearest rounding.
Overall flow
The example follows these high-level steps:
Instantiate the example evaluation and training pipeline
Load the FP32 model and evaluate the model to find the baseline FP32 accuracy
Create a quantization simulation model (with fake quantization ops) and evaluate the quantized simuation model
Apply AdaRound and evaluate the simulation model to get a post-finetuned quantized accuracy score
Note
This notebook does not show state-of-the-art results. For example, it uses a relatively quantization-friendly model (Resnet18). Also, some optimization parameters like number of fine-tuning epochs are chosen to improve execution speed in the notebook.
Dataset
This example does image classification on the ImageNet dataset. If you already have a version of the data set, use that. Otherwise download the data set, for example from https://image-net.org/challenges/LSVRC/2012/index .
Note
To speed up the execution of this notebook, you can use a reduced subset of the ImageNet dataset. For example: The entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. However, for the purpose of running this notebook, you can reduce the dataset to, say, two samples per class.
Edit the cell below to specify the directory where the downloaded ImageNet dataset is saved.
[ ]:
DATASET_DIR = '/path/to/dataset/' # Replace this path with a real directory
1. Instantiate the example training and validation pipeline
Use the following training and validation loop for the image classification task.
Things to note:
AIMET does not put limitations on how the training and validation pipeline is written. AIMET modifies the user’s model to create a QuantizationSim model, which is still a PyTorch model. The QuantizationSim model can be used in place of the original model when doing inference or training.
AIMET doesn not put limitations on the interface of the
evaluate()
ortrain()
methods. You should be able to use your existing evaluate and train routines as-is.
[ ]:
import tensorflow as tf
from Examples.common import image_net_config
from Examples.tensorflow.utils.keras.image_net_dataset import ImageNetDataset
from Examples.tensorflow.utils.keras.image_net_evaluator import ImageNetEvaluator
class ImageNetDataPipeline:
"""
Provides APIs for model evaluation and finetuning using ImageNet Dataset.
"""
@staticmethod
def get_val_dataset() -> tf.data.Dataset:
"""
Instantiates a validation dataloader for ImageNet dataset and returns it
:return: A tensorflow dataset
"""
data_loader = ImageNetDataset(DATASET_DIR,
image_size=image_net_config.dataset['image_size'],
batch_size=image_net_config.evaluation['batch_size'])
return data_loader
@staticmethod
def evaluate(model, iterations=None) -> float:
"""
Given a Keras model, evaluates its Top-1 accuracy on the validation dataset
:param model: The Keras model to be evaluated.
:param iterations: The number of iterations to run. If None, all the data will be used
:return: The accuracy for the sample with the maximum accuracy.
"""
evaluator = ImageNetEvaluator(DATASET_DIR,
image_size=image_net_config.dataset["image_size"],
batch_size=image_net_config.evaluation["batch_size"])
return evaluator.evaluate(model=model, iterations=iterations)
2. Load the model and evaluate to get a baseline FP32 accuracy score
2.1 Load a pretrained resnet18 model from torchvision.
You can load any pretrained PyTorch model instead.
[ ]:
from tensorflow.keras.applications.resnet50 import ResNet50
model = ResNet50(include_top=True,
weights="imagenet",
input_tensor=None,
input_shape=None,
pooling=None,
classes=1000)
2.2 Compute the floating point 32-bit (FP32) accuracy of this model using the evaluate() routine.
[ ]:
ImageNetDataPipeline.evaluate(model=model, iterations=10)
3. Create a quantization simulation model and determine quantized accuracy
Fold Batch Normalization layers
Before calculating the simulated quantized accuracy using QuantizationSimModel, fold the BatchNormalization (BN) layers into adjacent Convolutional layers. The BN layers that cannot be folded are left as they are.
BN folding improves inference performance on quantized runtimes but can degrade accuracy on these platforms. This step simulates this on-target drop in accuracy.
The following code calls AIMET to fold the BN layers of a given model. NOTE: During folding, a new model is returned. Please use the returned model for the rest of the pipeline.
3.1 Use the following code to call AIMET to fold the BN layers on the model.
Note
Folding returns a new model. Use the returned model for the rest of the pipeline.
[ ]:
from aimet_tensorflow.keras.batch_norm_fold import fold_all_batch_norms
_, model = fold_all_batch_norms(model)
Create the Quantization Sim Model
3.2 Use AIMET to create a QuantizationSimModel.
In this step, AIMET inserts fake quantization ops in the model graph and configures them.
Key parameters:
Setting default_output_bw to 8 performs all activation quantizations in the model using integer 8-bit precision
Setting default_param_bw to 8 performs all parameter quantizations in the model using integer 8-bit precision
See QuantizationSimModel in the AIMET API documentation for a full explanation of the parameters.
[ ]:
from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_common.defs import QuantScheme
sim = QuantizationSimModel(model=model,
quant_scheme=QuantScheme.post_training_tf,
rounding_mode="nearest",
default_output_bw=8,
default_param_bw=8)
AIMET has added quantizer nodes to the model graph, but before the sim model can be used for inference or training, scale and offset quantization parameters must be calculated for each quantizer node by passing unlabeled data samples through the model to collect range statistics. This process is sometimes referred to as calibration. AIMET refers to it as “computing encodings”.
3.3 Create a routine to pass unlabeled data samples through the model.
The following code is one way to write a routine that passes unlabeled samples through the model to compute encodings. It uses the existing train or validation data loader to extract samples and pass them to the model. Since there is no need to compute loss metrics, it ignores the model output.
[ ]:
from tensorflow.keras.utils import Progbar
from tensorflow.keras.applications.resnet import preprocess_input
def pass_calibration_data(sim_model, samples):
tf_dataset = ImageNetDataPipeline.get_val_dataset()
dataset = tf_dataset.dataset
batch_size = tf_dataset.batch_size
progbar = Progbar(samples)
batch_cntr = 0
for inputs, _ in dataset:
sim_model(preprocess_input(inputs))
batch_cntr += 1
progbar_stat_update = \
batch_cntr * batch_size if (batch_cntr * batch_size) < samples else samples
progbar.update(progbar_stat_update)
if (batch_cntr * batch_size) > samples:
break
A few notes regarding the data samples:
A very small percentage of the data samples are needed. For example, the training dataset for ImageNet has 1M samples; 500 or 1000 suffice to compute encodings.
The samples should be reasonably well distributed. While it’s not necessary to cover all classes, avoid extreme scenarios like using only dark or only light samples. That is, using only pictures captured at night, say, could skew the results.
3.4 Call AIMET to pass data through the model and compute the quantization encodings.
Encodings here refer to scale and offset quantization parameters.
[ ]:
sim.compute_encodings(forward_pass_callback=pass_calibration_data,
forward_pass_callback_args=1000)
3.5 Determine the simulated quantized accuracy of the equalized model. Create a simulation model like before and evaluate it to calculate accuracy.
[ ]:
ImageNetDataPipeline.evaluate(model=sim.model, iterations=10)
4. Apply Adaround
4.1 Use the code below to apply Adaround to the original model.
Some key parameters:
data_set: AdaRound needs a dataset to learn the rounding vectors. Either a training or validation dataloader can be passed in.
num_batches: is the number of batches used while calculating the quantization encodings. A typical value for Adaround is 2000 samples. To speed up the execution this example uses a batch size of one.
default_num_iterations: is the number of iterations to apply to each layer. Default value is 10000, and we strongly recommend using at least this number. This example uses 32 to speed up execution.
[ ]:
import os
from tensorflow.keras.applications.resnet import preprocess_input
from tensorflow.keras.preprocessing import image_dataset_from_directory
from aimet_tensorflow.keras.adaround_weight import Adaround, AdaroundParameters
ada_round_data = image_dataset_from_directory(directory=DATASET_DIR,
labels="inferred",
label_mode="categorical",
batch_size=image_net_config.evaluation["batch_size"],
shuffle=False,
image_size=(image_net_config.dataset["image_width"],
image_net_config.dataset["image_height"]))
ada_round_data = ada_round_data.map(lambda x, y: preprocess_input(x))
params = AdaroundParameters(data_set=ada_round_data, num_batches=1, default_num_iterations=32)
os.makedirs("./output/", exist_ok=True)
ada_model = Adaround.apply_adaround(model, params, path="output", filename_prefix="adaround",
default_param_bw=8, default_quant_scheme=QuantScheme.post_training_tf)
4.2 Quantize the Adarounded model.
Note
Two important points about the following code:
Parameter Biwidth Precision: The QuantizationSimModel must be created with the same parameter bitwidth precision that was used in
apply_adaround()
.Freezing the parameter encodings: After creating the QuantizationSimModel, you must call
set_and_freeze_param_encodings()
before callingcompute_encodings()
. During AdaRound, the parameters are rounded based on these initial internally created encodings. To maintain accuracy, it is important to freeze these encodings so that the call tocompute_encodings()
does not alter the parameter encodings negate the AdaRounded accuracy.
[ ]:
sim = QuantizationSimModel(model=ada_model,
quant_scheme=QuantScheme.post_training_tf,
rounding_mode="nearest",
default_output_bw=8,
default_param_bw=8)
sim.set_and_freeze_param_encodings(encoding_path=os.path.join("output", "adaround.encodings"))
sim.compute_encodings(forward_pass_callback=pass_calibration_data,
forward_pass_callback_args=1000)
4.3 Compute the accuracy of the Adarounded model.
Evaluate the simulation model as before to determine simulated quantized accuracy.
[ ]:
ImageNetDataPipeline.evaluate(model=sim.model, iterations=10)
There might be little gain in accuracy after this limited application of Adaround. Experiment with the hyper-parameters to get better results.
Next steps
Export the model and encodings.
Export the model with the updated weights but without the fake quant ops.
Export the encodings (scale and offset quantization parameters). AIMET QuantizationSimModel provides an export API for this purpose.
The following code performs these exports.
[ ]:
sim.export(path="./output", filename_prefix="resnet50_after_adaround")
For more information
See the AIMET API docs for details about the AIMET APIs and optional parameters.
See the other example notebooks to learn how to use other AIMET post-training quantization techniques.