Adaptive rounding¶

Context¶

Adaptive rounding (AdaRound) is a rounding mechanism for model weights designed to adapt to the data to improve the accuracy of the quantized model.

By default, AIMET uses nearest rounding for quantization, in which weight values are quantized to the nearest integer value. However, AdaRound uses training data to choose how to round quantized weights. This rounding technique improves the quantized model’s accuracy in many cases.

The following figures illustrates how AdaRound might change the rounding of a quantized value.

See the Optimization User Guide for a discussion of the recommended sequence of all quantization techniques.

Complementary techniques¶

As a standalone, AdaRound can yield a significant improvement in performance. If you’d like to layer other techniques with AdaRound, it is recommended to apply AdaRound:

After batch norm folding (BNF) and cross layer equalization (CLE): Applying these techniques first can improve the accuracy gained using AdaRound.
Before quantization aware training (QAT): AdaRound serves as a well-disciplined weights initialization method for QAT

Hyper parameters¶

A number of hyper parameters used during AdaRound optimization are exposed in the API. The default values of some of these parameters tend to lead to stable results and we recommend that you not change them.

Use the following guideline for adjusting hyper parameters with AdaRound.

Hyper Parameters to be changed at will

Number of batches. AdaRound should see 500-1000 images. Loader batch size times number of batches gives the number of images. For example if the data loader batch size is 64, set 16 batches to yield 1024 images.
Number of iterations. Default is 10,000.

Hyper Parameters to be changed with caution

Regularization parameter. Default is 0.01.

Hyper Parameters to avoid changing

Beta range. Leave the value at the default of (20, 2).
Warm start period. Leave at the default value, 20%.

You can learn more about the AdaRound parameters here

API¶

PyTorch

Top level APIs

aimet_torch.adaround.adaround_weight.Adaround.apply_adaround(model, dummy_input, params, path, filename_prefix, default_param_bw=4, param_bw_override_list=None, ignore_quant_ops_list=None, default_quant_scheme=QuantScheme.post_training_tf_enhanced, default_config_file=None)¶

Returns model with optimized weight rounding of every module (Conv and Linear) and also saves the corresponding quantization encodings to a separate JSON-formatted file that can then be imported by QuantSim for inference or QAT

Parameters:

model (Module) – Model to Adaround
dummy_input (Union[Tensor, Tuple]) – Dummy input to the model. Used to parse model graph. If the model has more than one input, pass a tuple. User is expected to place the tensors on the appropriate device.
params (AdaroundParameters) – Parameters for Adaround
path (str) – path where to store parameter encodings
filename_prefix (str) – Prefix to use for filename of the encodings file
default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters
param_bw_override_list (Optional[List[Tuple[Module, int]]]) – List of Tuples. Each Tuple is a module and the corresponding parameter bitwidth to be used for that module.
ignore_quant_ops_list (Optional[List[Module]]) – Ops listed here are skipped during quantization needed for AdaRounding. Do not specify Conv and Linear modules in this list. Doing so, will affect accuracy.
default_quant_scheme (QuantScheme) – Quantization scheme. Supported options are using Quant Scheme Enum QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced
default_config_file (Optional[str]) – Default configuration file for model quantizers

Return type:

Module

Returns:

Model with Adarounded weights and saves corresponding parameter encodings JSON file at provided path

Adaround parameters

class aimet_torch.adaround.adaround_weight.AdaroundParameters(data_loader, num_batches, default_num_iterations=None, default_reg_param=0.01, default_beta_range=(20, 2), default_warm_start=0.2, forward_fn=None)[source]¶

Configuration parameters for Adaround

Parameters:

data_loader (DataLoader) – Data loader
num_batches (int) – Number of batches to be used for Adaround. A commonly recommended value for this parameter is the smaller value among (1) len(data_loader) and (2) ceil(2000/batch_size)
default_num_iterations (Optional[int]) – Number of iterations to adaround each layer. The default value is 10K for models with 8- or higher bit weights, and 15K for models with lower than 8 bit weights.
default_reg_param (float) – Regularization parameter, trading off between rounding loss vs reconstruction loss. Default 0.01
default_beta_range (Tuple) – Start and stop beta parameter for annealing of rounding loss (start_beta, end_beta). Default (20, 2)
default_warm_start (float) – warm up period, during which rounding loss has zero effect. Default 20% (0.2)
forward_fn (Optional[Callable[[Module, Any], Any]]) – Optional adapter function that performs forward pass given a model and inputs yielded from the data loader. The function expects model as first argument and inputs to model as second argument.

TensorFlow

Top-level API

aimet_tensorflow.keras.adaround_weight.Adaround.apply_adaround(model, params, path, filename_prefix, default_param_bw=4, default_quant_scheme=QuantScheme.post_training_tf_enhanced, config_file=None)¶

Returns model with optimized weight rounding of every op (Conv and Linear) and also saves the corresponding quantization encodings to a separate JSON-formatted file that can then be imported by QuantSim for inference or QAT

Parameters:

model (Model) – Model to adaround
params (AdaroundParameters) – Parameters for adaround
path (str) – path where to store parameter encodings
filename_prefix (str) – Prefix to use for filename of the encodings file
default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters. Default 4
default_quant_scheme (QuantScheme) – Quantization scheme. Supported options are QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced. Default QuantScheme.post_training_tf_enhanced
config_file (Optional[str]) – Configuration file for model quantizers

Return type:

Model

Returns:

Model with Adarounded weights

Adaround Parameters

class aimet_tensorflow.keras.adaround_weight.AdaroundParameters(data_set, num_batches, default_num_iterations=10000, default_reg_param=0.01, default_beta_range=(20, 2), default_warm_start=0.2)[source]¶

Configuration parameters for Adaround

Parameters:

data_set (DatasetV2) – TF Data set
num_batches (int) – Number of batches
default_num_iterations (int) – Number of iterations to adaround each layer. Default 10000
default_reg_param (float) – Regularization parameter, trading off between rounding loss vs reconstruction loss. Default 0.01
default_beta_range (Tuple) – Start and stop beta parameter for annealing of rounding loss (start_beta, end_beta). Default (20, 2)
default_warm_start (float) – warm up period, during which rounding loss has zero effect. Default 20% (0.2)

ONNX

Note

It is recommended to use onnx-simplifier before adarounding the model.

Top-level API

aimet_onnx.adaround.adaround_weight.Adaround.apply_adaround(model, params, path, filename_prefix, default_param_bw=4, param_bw_override_list=None, ignore_quant_ops_list=None, default_quant_scheme=QuantScheme.post_training_tf_enhanced, default_config_file=None, use_cuda=True, device=0, user_onnx_libs=None)¶

Parameters:

model (ModelProto) – Model to Adaround
params (AdaroundParameters) – Parameters for Adaround
path (str) – path where to store parameter encodings
filename_prefix (str) – Prefix to use for filename of the encodings file
default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters
param_bw_override_list (Optional[List[Tuple[str, int]]]) – List of Tuples. Each Tuple is a param name and the corresponding parameter bitwidth to be used for that param.
ignore_quant_ops_list (Optional[List[str]]) – Ops listed here are skipped during quantization needed for AdaRounding. Do not specify Conv and Linear modules in this list. Doing so, will affect accuracy.
default_quant_scheme (QuantScheme) – Quantization scheme. Supported options are using Quant Scheme Enum QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced
default_config_file (Optional[str]) – Default configuration file for model quantizers
use_cuda (bool) – If we should use cuda
device (int) – CUDA device ID
user_onnx_libs (Optional[List[str]]) – List of paths to all compiled ONNX custom ops libraries

Return type:

ModelProto

Returns:

Model with Adarounded weights and saves corresponding parameter encodings JSON file at provided path

Adaround Parameters

class aimet_onnx.adaround.adaround_weight.AdaroundParameters(data_loader, num_batches, default_num_iterations=None, default_reg_param=0.01, default_beta_range=(20, 2), default_warm_start=0.2, forward_fn=None, forward_pass_callback_args=None)[source]¶

Configuration parameters for Adaround

Parameters:

data_loader – Data loader
num_batches (int) – Number of batches to be used for Adaround. A commonly recommended value for this parameter is the smaller value among (1) len(data_loader) and (2) ceil(2000/batch_size)
default_num_iterations (Optional[int]) – Number of iterations to adaround each layer. The default value is 10K for models with 8- or higher bit weights, and 15K for models with lower than 8 bit weights.
default_reg_param (float) – Regularization parameter, trading off between rounding loss vs reconstruction loss. Default 0.01
default_beta_range (Tuple) – Start and stop beta parameter for annealing of rounding loss (start_beta, end_beta). Default (20, 2)
default_warm_start (float) – warm up period, during which rounding loss has zero effect. Default 20% (0.2)
forward_fn (Optional[Callable]) – Function to compute encodings for sim
forward_pass_callback_args – These argument(s) are passed to the forward_pass_callback as-is. Up to the user to determine the type of this parameter. E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of parameters or an object representing something more complex. If set to None, forward_pass_callback will be invoked with no parameters.

Adaptive rounding¶

Context¶

Complementary techniques¶

Hyper parameters¶

Workflow¶

Prerequisites¶

Workflow¶

Setup¶

Step 1¶

Step 2¶

Step 3¶

Step 4¶

API¶