Quantization analyzer

Context

The Quantization analyzer (QuantAnalyzer) performs several analyses to identify sensitive areas and hotspots in your model. These analyses are performed automatically. To use QuantAnalyzer, you pass in callbacks to perform forward passes and evaluations, and optionally a dataloader for MSE loss analysis.

For each analysis, QuantAnalyzer outputs JSON and/or HTML files containing data and plots for visualization.

Detailed analysis descriptions

QuantAnalyzer performs the following analyses:

1. Sensitivity analysis to weight and activation quantization

QuantAnalyzer compares the accuracies of the original FP32 model, an activation-only quantized model, and a weight-only quantized model. This helps determine which AIMET quantization technique(s) will be more beneficial for the model.

For example, in situations where the model is more sensitive to activation quantization, Post-training quantization (PTQ) techniques like Adaptive Rounding (Adaround) or Cross-layer equalization (CLE) might not be very helpful.

Quantized accuracy metric for your model are printed as part of AIMET logging.

2. Per-layer quantizer enablement analysis

Sometimes the accuracy drop incurred from quantization can be attributed to only a subset of layers within the model. QuantAnalyzer finds such layers by enabling and disabling individual quantizers to observe how the quantized model accuracy metric changes.

The following two types of quantizer enablement analyses are performed:

1. Disable all quantizers across the model and, for each layer, enable only that layer’s output quantizer and perform evaluation with the provided callback. This results in accuracy values obtained for each layer in the model when only that layer’s quantizer is enabled, exposing the effects of individual layer quantization and pinpointing culprit layer(s) and hotspots.

2. Enable all quantizers across the model and, for each layer, disable only that layer’s output quantizer and perform evaluation with the provided callback. Once again, accuracy values are produced for each layer in the model when only that layer’s quantizer is disabled.

As a result of these analyses, AIMET outputs per_layer_quant_enabled.html and per_layer_quant_disabled.html respectively, containing plots mapping layers on the x-axis to quantized model accuracy metrics on the y-axis.

JSON files per_layer_quant_enabled.json and per_layer_quant_disabled.json are also produced, containing the data shown in the .html plots.

3. Per-layer encodings min-max range analysis

As part of quantization, encoding parameters for each quantizer must be obtained. These parameters include scale, offset, min, and max, and are used to map floating point values to quantized integer values.

QuantAnalyzer tracks the min and max encoding parameters computed by each quantizer in the model as a result of forward passes through the model with representative data (from which the scale and offset values can be directly obtained).

As a result of this analysis, AIMET outputs html plots and json files for each activation quantizer and each parameter quantizer (contained in the min_max_ranges folder) containing the encoding min/max values for each.

If Per-channel quantization (PCQ) is enabled, encoding min and max values for all the channels of each weight parameters are shown.

4. Per-layer statistics histogram

Under the TF-enhanced quantization scheme, encoding min/max values for each quantizer are obtained by collecting a histogram of tensor values seen at that quantizer and deleting outliers.

When this quantization scheme is selected, QuantAnalyzer outputs plots for each quantizer in the model, displaying the histogram of tensor values seen at that quantizer.

These plots are available as part of the activations_pdf and weights_pdf folders, containing a separate .html plot for each quantizer.

5. Per layer mean-square-error (MSE) loss

QuantAnalyzer can monitor each layer’s output in the original FP32 model as well as the corresponding layer output in the quantized model and calculate the MSE loss between the two.

This helps identify which layers may contribute more to quantization noise.

To enable this optional analysis, you pass in a dataloader that QuantAnalyzer reads from. Approximately 256 samples are sufficient for the analysis.

A per_layer_mse_loss.html file is generated containing a plot that maps layer quantizers on the x-axis to MSE loss on the y-axis. A corresponding per_layer_mse_loss.json file is generated containing data corresponding to the .html file.

Prerequisites

To call the QuantAnalyzer API, you must provide the following:

  • An FP32 pre-trained model for analysis

  • A dummy input for the model that can contain random values but which must match the shape of the model’s expected input

  • A user-defined function for passing 500-1000 representative data samples through the model for quantization calibration

  • A user-defined function for passing labeled data through the model for evaluation, returning an accuracy metric

  • (Optional, for running MSE loss analysis) A dataloader providing unlabeled data to be passed through the model

Note

Typically on quantized runtimes, batch normalization (BN) layers are folded where possible. So that you don’t have to call a separate API to do so, QuantAnalyzer automatically performs Batch Norm Folding before running its analysis.

Workflow

Code example

Step 1 Prepare callback for calibration

Required imports

from typing import Any
import torch
from torchvision import models
from aimet_common.defs import QuantScheme
from aimet_torch.model_preparer import prepare_model
from aimet_torch.v1.quant_analyzer import QuantAnalyzer, CallbackFunc

Prepare forward pass callback

# NOTE: In the actual use cases, the users should implement this part to serve
#       their own goals if necessary.
def forward_pass_callback(model: torch.nn.Module, _: Any = None) -> None:
    """
    NOTE: This is intended to be the user-defined model calibration function.
    AIMET requires the above signature. So if the user's calibration function does not
    match this signature, please create a simple wrapper around this callback function.

    A callback function for model calibration that simply runs forward passes on the model to
    compute encoding (delta/offset). This callback function should use representative data and should
    be subset of entire train/validation dataset (~1000 images/samples).

    :param model: PyTorch model.
    :param _: Argument(s) of this callback function. Up to the user to determine the type of this parameter.
    E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of
    parameters or an object representing something more complex.
    """
    # User action required
    # User should create data loader/iterable using representative dataset and simply run
    # forward passes on the model.

Required imports

from typing import Any

import numpy as np
import tensorflow as tf

from aimet_common.defs import QuantScheme
from aimet_common.utils import CallbackFunc
from aimet_tensorflow.keras.model_preparer import prepare_model
from aimet_tensorflow.keras.quant_analyzer import QuantAnalyzer

Prepare toy dataset to run example code

NUM_SAMPLES = 256
NUM_CLASSES = 1000
INPUT_SHAPES = (224, 224, 3)

images = np.random.rand(NUM_SAMPLES, *INPUT_SHAPES)
labels = np.eye(NUM_CLASSES)[np.random.choice(NUM_CLASSES, NUM_SAMPLES)]

image_dataset = tf.data.Dataset.from_tensor_slices(images)
label_dataset = tf.data.Dataset.from_tensor_slices(labels)

eval_dataset = tf.data.Dataset.zip((image_dataset, label_dataset)).batch(32)
unlabeled_dataset = eval_dataset.map(lambda image, label: image)

Prepare forward pass callback

# NOTE: In the actual use cases, the users should implement this part to serve
#       their own goals if necessary.
def forward_pass_callback(model: tf.keras.Model, _: Any = None) -> None:
    """
    NOTE: This is intended to be the user-defined model calibration function.
    AIMET requires the above signature. So if the user's calibration function does not
    match this signature, please create a simple wrapper around this callback function.

    A callback function for model calibration that simply runs forward passes on the model to
    compute encoding (delta/offset). This callback function should use representative data and should
    be subset of entire train/validation dataset (~1000 images/samples).

    :param model: tf.keras.Model object.
    :param _: Argument(s) of this callback function. Up to the user to determine the type of this parameter.
    E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of
    parameters or an object representing something more complex.
    """
    # User action required
    # User should create data loader/iterable using representative dataset and simply run
    # forward passes on the model.
    _ = model.predict(unlabeled_dataset)

Required imports

from typing import Any
import numpy as np
from onnxruntime import InferenceSession
from onnxsim import simplify

from aimet_common.defs import QuantScheme
from aimet_common.utils import CallbackFunc

from aimet_onnx.quant_analyzer import QuantAnalyzer

Prepare forward pass callback

# NOTE: In the actual use cases, the users should implement this part to serve
#       their own goals if necessary.
def forward_pass_callback(session: InferenceSession, _: Any = None) -> None:
    """
    NOTE: This is intended to be the user-defined model calibration function.
    AIMET requires the above signature. So if the user's calibration function does not
    match this signature, please create a simple wrapper around this callback function.

    A callback function for model calibration that simply runs forward passes on the model to
    compute encoding (delta/offset). This callback function should use representative data and should
    be subset of entire train/validation dataset (~1000 images/samples).

    :param session: OnnxRuntime Inference Session.
    :param _: Argument(s) of this callback function. Up to the user to determine the type of this parameter.
    E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of
    parameters or an object representing something more complex.
    """
    # User action required
    # User should create data loader/iterable using representative dataset and simply run
    # forward passes on the model.

Step 2 Prepare callback for quantized model evaluation

Prepare eval callback

# NOTE: In the actual use cases, the users should implement this part to serve
#       their own goals if necessary.
def eval_callback(model: torch.nn.Module, _: Any = None) -> float:
    """
    NOTE: This is intended to be the user-defined model evaluation function.
    AIMET requires the above signature. So if the user's calibration function does not
    match this signature, please create a simple wrapper around this callback function.

    A callback function for model evaluation that determines model performance. This callback function is
    expected to return scalar value representing the model performance evaluated against entire
    test/evaluation dataset.

    :param model: PyTorch model.
    :param _: Argument(s) of this callback function. Up to the user to determine the type of this parameter.
    E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of
    parameters or an object representing something more complex.
    :return: Scalar value representing the model performance.
    """
    # User action required
    # User should create data loader/iterable using entire test/evaluation dataset, perform forward passes on
    # the model and return single scalar value representing the model performance.
    return .8

Prepare eval callback

# NOTE: In the actual use cases, the users should implement this part to serve
#       their own goals if necessary.
def eval_callback(model: tf.keras.Model, _: Any = None) -> float:
    """
    NOTE: This is intended to be the user-defined model evaluation function.
    AIMET requires the above signature. So if the user's calibration function does not
    match this signature, please create a simple wrapper around this callback function.

    A callback function for model evaluation that determines model performance. This callback function is
    expected to return scalar value representing the model performance evaluated against entire
    test/evaluation dataset.

    :param model: tf.keras.Model object.
    :param _: Argument(s) of this callback function. Up to the user to determine the type of this parameter.
    E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of
    parameters or an object representing something more complex.
    :return: Scalar value representing the model performance.
    """
    # User action required
    # User should create data loader/iterable using entire test/evaluation dataset, perform forward passes on
    # the model and return single scalar value representing the model performance.

    model.compile(optimizer=tf.keras.optimizers.Adam(),
                  loss=tf.keras.losses.CategoricalCrossentropy(),
                  metrics=tf.keras.metrics.CategoricalAccuracy())

    _, acc = model.evaluate(eval_dataset)
    return acc

Prepare eval callback

# NOTE: In the actual use cases, the users should implement this part to serve
#       their own goals if necessary.
def eval_callback(session: InferenceSession, _: Any = None) -> float:
    """
    NOTE: This is intended to be the user-defined model evaluation function.
    AIMET requires the above signature. So if the user's calibration function does not
    match this signature, please create a simple wrapper around this callback function.

    A callback function for model evaluation that determines model performance. This callback function is
    expected to return scalar value representing the model performance evaluated against entire
    test/evaluation dataset.

    :param session: OnnxRuntime Inference Session.
    :param _: Argument(s) of this callback function. Up to the user to determine the type of this parameter.
    E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of
    parameters or an object representing something more complex.
    :return: Scalar value representing the model performance.
    """
    # User action required
    # User should create data loader/iterable using entire test/evaluation dataset, perform forward passes on
    # the model and return single scalar value representing the model performance.
    return .8

Step 3 Prepare model and callback functions

Prepare model and callback functions

    model = models.resnet18(pretrained=True).cuda().eval()
    input_shape = (1, 3, 224, 224)
    dummy_input = torch.randn(*input_shape).cuda()
    prepared_model = prepare_model(model)

    # User action required
    # User should pass actual argument(s) of the callback functions.
    forward_pass_callback_fn = CallbackFunc(forward_pass_callback, func_callback_args=None)
    eval_callback_fn = CallbackFunc(eval_callback, func_callback_args=None)
    model = tf.keras.applications.ResNet50()
    prepared_model = prepare_model(model)

Prepare model, callback functions and dataloader

    onnx_model = Model()
    # Simplify the model
    onnx_model, _ = simplify(onnx_model)

    input_shape = (1, 3, 224, 224)
    dummy_data = np.random.randn(*input_shape).astype(np.float32)
    dummy_input = {'input': dummy_data}

    # User action required
    # User should pass actual argument(s) of the callback functions.
    forward_pass_callback_fn = CallbackFunc(forward_pass_callback, func_callback_args=None)
    eval_callback_fn = CallbackFunc(eval_callback, func_callback_args=None)

    # User action required
    # User should use unlabeled dataloader, so if the dataloader yields labels as well user should discard them.
    unlabeled_data_loader = _get_unlabled_data_loader()

Step 4 Create QuantAnalyzer and run analysis

Create QuantAnalyzer object

    quant_analyzer = QuantAnalyzer(model=prepared_model,
                                   dummy_input=dummy_input,
                                   forward_pass_callback=forward_pass_callback_fn,
                                   eval_callback=eval_callback_fn)

    # User action required
    # User should use unlabeled dataloader, so if the dataloader yields labels as well user should use discard them.
    unlabeled_data_loader = _get_unlabled_data_loader()
    # Approximately 256 images/samples are recommended for MSE loss analysis. So, if the dataloader
    # has batch_size of 64, then 4 number of batches leads to 256 images/samples.
    quant_analyzer.enable_per_layer_mse_loss(unlabeled_dataset_iterable=unlabeled_data_loader, num_batches=4)

Run QuantAnalyzer

    quant_analyzer.analyze(quant_scheme=QuantScheme.post_training_tf_enhanced,
                           default_param_bw=8,
                           default_output_bw=8,
                           config_file=None,
                           results_dir="./quant_analyzer_results/")

Create QuantAnalyzer object

    quant_analyzer = QuantAnalyzer(model=prepared_model,
                                   forward_pass_callback=forward_pass_callback_fn,
                                   eval_callback=eval_callback_fn)

    # Approximately 256 images/samples are recommended for MSE loss analysis. So, if the dataset
    # has batch_size of 64, then 4 number of batches leads to 256 images/samples.
    quant_analyzer.enable_per_layer_mse_loss(unlabeled_dataset=unlabeled_dataset, num_batches=4)

Run QuantAnalyzer

    quant_analyzer.analyze(quant_scheme=QuantScheme.post_training_tf_enhanced,
                           default_param_bw=8,
                           default_output_bw=8,
                           config_file=None,
                           results_dir="./quant_analyzer_results/")

Create QuantAnalyzer object

    quant_analyzer = QuantAnalyzer(model=onnx_model,
                                   dummy_input=dummy_input,
                                   forward_pass_callback=forward_pass_callback_fn,
                                   eval_callback=eval_callback_fn)
    # Approximately 256 images/samples are recommended for MSE loss analysis. So, if the dataloader
    # has batch_size of 64, then 4 number of batches leads to 256 images/samples.
    quant_analyzer.enable_per_layer_mse_loss(unlabeled_dataset_iterable=unlabeled_data_loader, num_batches=4)

Run QuantAnalyzer

    quant_analyzer.analyze(quant_scheme=QuantScheme.post_training_tf_enhanced,
                           default_param_bw=8,
                           default_activation_bw=8,
                           config_file=None,
                           results_dir="./quant_analyzer_results/")

API

Top level APIs

class aimet_common.utils.CallbackFunc(func, func_callback_args=None)[source]

Class encapsulating call back function and it’s arguments

Parameters:
  • func (Callable) – Callable Function

  • func_callback_args – Arguments passed to the callable function

class aimet_torch.quant_analyzer.QuantAnalyzer(model, dummy_input, forward_pass_callback, eval_callback, modules_to_ignore=None)[source]

QuantAnalyzer tool provides

  1. model sensitivity to weight and activation quantization

  2. per layer sensitivity analysis

  3. per layer encoding (min - max range)

  4. per PDF analysis and

  5. per layer MSE analysis

Parameters:
  • model (Module) – FP32 model to analyze for quantization.

  • dummy_input (Union[Tensor, Tuple]) – Dummy input to model.

  • forward_pass_callback (CallbackFunc) – A callback function for model calibration that simply runs forward passes on the model to compute encoding (delta/offset). This callback function should use representative data and should be subset of entire train/validation dataset (~1000 images/samples).

  • eval_callback (CallbackFunc) – A callback function for model evaluation that determines model performance. This callback function is expected to return scalar value representing the model performance evaluated against entire test/evaluation dataset.

  • modules_to_ignore (Optional[List[Module]]) – Excludes certain modules from being analyzed.

QuantAnalyzer.analyze(quant_scheme=QuantScheme.post_training_tf_enhanced, default_param_bw=8, default_output_bw=8, config_file=None, results_dir='./tmp/')
Analyze model for quantization and point out sensitive parts/hotspots of the model by performing
  1. model sensitivity to quantization,

  2. perform per layer sensitivity analysis by enabling and disabling quant wrappers,

  3. export per layer encodings min - max ranges,

  4. export per layer statistics histogram (PDF) when quant scheme is TF-Enhanced,

  5. per layer MSE analysis

Parameters:
  • quant_scheme (QuantScheme) – Quantization scheme. Supported values are QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced.

  • default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters.

  • default_output_bw (int) – Default bitwidth (4-31) to use for quantizing layer inputs and outputs.

  • config_file (Optional[str]) – Path to configuration file for model quantizers.

  • results_dir (str) – Directory to save the results.

Alternatively, you can run specific utility

You can avoid running all the utilities that QuantAnalyzer offers and only run those of your interest. For this you need to have the QuantizationSimModel object, Then you call the desired QuantAnalyzer utility of your interest and pass the same object to it.

QuantAnalyzer.check_model_sensitivity_to_quantization(sim)

Perform the sensitivity analysis to weight and activation quantization individually.

Parameters:

sim (_QuantizationSimModelInterface) – Quantsim model.

Return type:

Tuple[float, float, float]

Returns:

FP32 eval score, weight-quantized eval score, act-quantized eval score.

QuantAnalyzer.perform_per_layer_analysis_by_enabling_quant_wrappers(sim, results_dir)

NOTE: Option 1

  1. All quant wrappers’ parameters and activations quantizers are disabled.

  2. Based on occurrence for every quant wrappers
    • Each quant wrapper’s parameters and activations quantizers are enabled as per JSON config file and set to bit-width specified.

    • Measure and record eval score on subset of dataset.

    • Disable enabled quantizers in step 1.

  3. Returns dictionary containing quant wrapper name and corresponding eval score.

Parameters:
  • sim (_QuantizationSimModelInterface) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Dict

Returns:

layer wise eval score dictionary. dict[layer_name] = eval_score

QuantAnalyzer.perform_per_layer_analysis_by_disabling_quant_wrappers(sim, results_dir)

NOTE: Option 2

  1. All quant wrappers’ parameters and activations quantizers are enabled as per JSON config file and set to bit-width specified.

  2. Based on occurrence for every quant wrappers
    • Each quant wrapper’s parameters and activations quantizers are disabled.

    • Measure and record eval score on subset of dataset.

    • Enable disabled quantizers in step 1.

  3. Returns dictionary containing quant wrapper name and corresponding eval score.

Parameters:
  • sim (_QuantizationSimModelInterface) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Dict

Returns:

layer wise eval score dictionary. dict[layer_name] = eval_score

QuantAnalyzer.export_per_layer_encoding_min_max_range(sim, results_dir)

Export encoding min and max range for all weights and activations. results_dir should have html files in following format.

-results_dir

-activations.html -weights.html

If per channel quantization(PCQ) is enabled then,

-results_dir

-activations.html -{wrapped_module_name}_{param_name}.html

Parameters:
  • sim (_QuantizationSimModelInterface) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Tuple[Dict, Dict]

Returns:

layer wise min-max range for weights and activations.

QuantAnalyzer.export_per_layer_stats_histogram(sim, results_dir)

NOTE: Not to invoke when quantization scheme is not TF-Enhanced.

Export histogram that represents a PDF of collected statistics by a quantizer for every quant wrapper. After invoking this API, results_dir should have html files in following format for every quantizers of quant wrappers.

-results_dir
-activations_pdf

name_{input/output}_{index}.html

-weights_pdf
-name

param_name_{channel_index}.html

Parameters:
  • sim (_QuantizationSimModelInterface) – Quantsim model.

  • results_dir (str) – Directory to save the results.

QuantAnalyzer.export_per_layer_mse_loss(sim, results_dir)

NOTE: Need to pass same model input data through both fp32 and quantsim model to tap output activations of each layer.

Export MSE loss between fp32 and quantized output activations for each layer. :type sim: _QuantizationSimModelInterface :param sim: Quantsim model. :type results_dir: str :param results_dir: Directory to save the results. :return layer wise MSE loss. dict[layer_name] = MSE loss.

Return type:

Dict

Top level APIs

class aimet_tensorflow.keras.quant_analyzer.QuantAnalyzer(model, forward_pass_callback, eval_callback)[source]

QuantAnalyzer tool provides

  1. model sensitivity to weight and activation quantization

  2. per layer sensitivity analysis

  3. per layer encoding (min - max range)

  4. per PDF analysis and

  5. per layer MSE analysis

Parameters:
  • model (Model) – FP32 model to analyze for quantization.

  • forward_pass_callback (CallbackFunc) – A callback function for model calibration that simply runs forward passes on the model to compute encoding (delta/offset). This callback function should use representative data and should be subset of entire train/validation dataset (~1000 images/samples).

  • eval_callback (CallbackFunc) – A callback function for model evaluation that determines model performance. This callback function is expected to return scalar value representing the model performance evaluated against entire test/evaluation dataset.

analyze(quant_scheme=QuantScheme.post_training_tf_enhanced, rounding_mode='nearest', default_param_bw=8, default_output_bw=8, config_file=None, results_dir='./tmp/')[source]
Analyze model for quantization and point out sensitive parts/hotspots of the model by performing
  1. model sensitivity to quantization,

  2. perform per layer sensitivity analysis by enabling and disabling quant wrappers,

  3. export per layer encodings min - max ranges,

  4. export per layer statistics histogram (PDF) when quant scheme is TF-Enhanced,

  5. per layer MSE analysis

Parameters:
  • quant_scheme (QuantScheme) – Quantization scheme. Supported values are QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced.

  • rounding_mode (str) – The round scheme to used. One of: ‘nearest’ or ‘stochastic’, defaults to ‘nearest’

  • default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters.

  • default_output_bw (int) – Default bitwidth (4-31) to use for quantizing layer inputs and outputs.

  • config_file (Optional[str]) – Path to configuration file for model quantizers.

  • results_dir (str) – Directory to save the results.

check_model_sensitivity_to_quantization(sim, default_param_bw, default_output_bw)[source]

Perform the sensitivity analysis to weight and activation quantization individually.

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters.

  • default_output_bw (int) – Default bitwidth (4-31) to use for quantizing layer inputs and outputs.

Returns:

FP32 eval score, weight-quantized eval score, act-quantized eval score.

enable_per_layer_mse_loss(unlabeled_dataset, num_batches)[source]

Enable per layer MSE loss analysis.

Parameters:
  • unlabeled_dataset (DatasetV2) – tf.data.Dataset provided as input to the model and used to calculate mse loss

  • num_batches (int) – Maximum number of batches to be used for MSE loss calculation

Return type:

None

export_per_layer_encoding_min_max_range(sim, results_dir)[source]

Export encoding min and max range for all weights and activations. results_dir should have html files in following format.

-results_dir

-activations.html -weights.html

If per channel quantization(PCQ) is enabled then,

-results_dir

-activations.html -{wrapped_module_name}_{param_name}.html

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Tuple[Dict, Dict]

Returns:

layer wise min-max range for weights and activations.

export_per_layer_mse_loss(sim, results_dir)[source]

NOTE: Need to pass same model input data through both fp32 and quantsim model to tap output activations of each layer.

Export MSE loss between fp32 and quantized output activations for each layer. :type sim: QuantizationSimModel :param sim: Quantsim model. :type results_dir: str :param results_dir: Directory to save the results. :return layer wise MSE loss. dict[layer_name] = MSE loss.

Return type:

Dict[str, float]

export_per_layer_stats_histogram(sim, results_dir)[source]

NOTE: Not to invoke when quantization scheme is not TF-Enhanced.

Export histogram that represents a PDF of collected statistics by a quantizer for every quant wrapper. After invoking this API, results_dir should have html files in following format for every quantizers of quant wrappers.

-results_dir
-activations_pdf

name_{input/output}_{index}.html

-weights_pdf
-name

param_name_{channel_index}.html

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

None

perform_per_layer_analysis_by_disabling_quant_wrappers(sim, results_dir)[source]

NOTE: Option 2

  1. All quant wrappers’ parameters and activations quantizers are enabled as per JSON config file and set to bit-width specified.

  2. For every quant wrappers, based on occurrence:
    1. Each quant wrapper’s parameters and activations quantizers are disabled.

    2. Measure and record eval score on subset of dataset.

    3. Enable disabled quantizers in step i.

  3. Returns dictionary containing quant wrapper name and corresponding eval score.

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Dict[str, float]

Returns:

layer wise eval score dictionary. dict[layer_name] = eval_score

perform_per_layer_analysis_by_enabling_quant_wrappers(sim, results_dir)[source]

NOTE: Option 1

  1. All quant wrappers’ parameters and activations quantizers are disabled.

  2. For every quant wrappers, based on occurrence:
    1. Each quant wrapper’s parameters and activations quantizers are enabled as per JSON config file and set to bit-width specified.

    2. Measure and record eval score on subset of dataset.

    3. Disable enabled quantizers in step i.

  3. Returns dictionary containing quant wrapper name and corresponding eval score.

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Dict[str, float]

Returns:

layer-wise eval score dictionary. dict[layer_name] = eval_score

Top level APIs

Note

It is recommended to use onnx-simplifier before applying quant-analyzer.

class aimet_onnx.quant_analyzer.QuantAnalyzer(model, dummy_input, forward_pass_callback, eval_callback)[source]

QuantAnalyzer provides following utilities:

  1. model sensitivity to weight and activation quantization

  2. per layer sensitivity analysis

  3. per layer encoding (min - max range)

  4. per layer quantizer historgram analysis and

  5. per layer MSE analysis

Parameters:
  • model (Union[ModelProto, ONNXModel]) – FP32 model to analyze for quantization.

  • dummy_input (Dict[str, ndarray]) – Dummy input to model.

  • forward_pass_callback (CallbackFunc) – A callback function for model calibration that simply runs forward passes on the model to compute encoding (delta/offset). This callback function should use representative data and should be subset of entire train/validation dataset (~1000 images/samples).

  • eval_callback (CallbackFunc) – A callback function for model evaluation that determines model performance. This callback function is expected to return scalar value representing the model performance evaluated against entire test/evaluation dataset.

QuantAnalyzer.enable_per_layer_mse_loss(unlabeled_dataset_iterable, num_batches)[source]

Enables per layer MSE loss analysis.

Parameters:
  • unlabeled_dataset_iterable (Iterable) – A collection (i.e. iterable with __len__) that iterates over an unlabeled dataset. The values yielded by this iterable are expected to be able to be passed directly to the model.

  • num_batches (int) – Number of batches. Approximately 256 samples/images are recommended, so if batch size of data loader is 64, then 4 number of batches leads to 256 samples/images.

QuantAnalyzer.analyze(quant_scheme=QuantScheme.post_training_tf_enhanced, default_param_bw=8, default_activation_bw=8, config_file=None, results_dir='./tmp/')[source]
Analyzes model for quantization and point out sensitive parts/hotspots of the model by performing
  1. model sensitivity to quantization,

  2. perform per layer sensitivity analysis by enabling and disabling quantizers,

  3. export per layer encodings min - max ranges,

  4. export per layer quantizer stats histogram,

  5. per layer MSE analysis

Parameters:
  • quant_scheme (QuantScheme) – Quantization scheme. Supported values are QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced.

  • default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters.

  • default_activation_bw (int) – Default bitwidth (4-31) to use for quantizing layer inputs and outputs.

  • config_file (Optional[str]) – Path to configuration file for model quantizers.

  • results_dir (str) – Directory to save the results.

Alternatively, you can run specific utility

You can avoid running all the utilities that QuantAnalyzer offers and only run those of your interest. For this you need to have the QuantizationSimModel object, Then you call the desired QuantAnalyzer utility of your interest and pass the same object to it.

QuantAnalyzer.create_quantsim_and_encodings(quant_scheme, default_param_bw, default_activation_bw, config_file)[source]

Creates quantsim object and computes encodings.

Parameters:
  • quant_scheme (QuantScheme) – Quantization scheme.

  • default_param_bw (int) – Default bitwidth (4-31) to use for quantizing layer parameters.

  • default_activation_bw (int) – Default bitwidth (4-31) to use for quantizing layer inputs and outputs.

  • config_file (str) – Path to configuration file for model quantizers.

Return type:

QuantizationSimModel

Returns:

Quantsim object.

QuantAnalyzer.check_model_sensitivity_to_quantization(sim)[source]

Performs model sensitivity analysis to weight and activation quantization individually.

Parameters:

sim (QuantizationSimModel) – Quantsim model.

Return type:

Tuple[float, float, float]

Returns:

FP32 eval score, weight-quantized eval score, act-quantized eval score.

QuantAnalyzer.perform_per_layer_analysis_by_enabling_quantizers(sim, results_dir)[source]

Performs layer-wise quantization sensitivity analysis by enabling its quantizers

  1. All parameter and activation quantizers are disabled.

  2. For every layer, based on occurrence:
    1. Each layer’s parameters and activations quantizers are enabled as per JSON config file and set to bit-width specified.

    2. Measure and record eval score on subset of dataset.

    3. Disable enabled quantizers in step a.

  3. Returns dictionary containing layer name and corresponding eval score.

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Dict

Returns:

layer wise eval score dictionary. dict[layer_name] = eval_score

QuantAnalyzer.perform_per_layer_analysis_by_disabling_quantizers(sim, results_dir)[source]

Performs layer-wise quantization sensitivity analysis by disabling its quantizers

  1. All parameter and activation quantizers are enabled as per JSON config file and set to bit-width specified.

  2. For every layer, based on occurrence:
    1. Each layer’s parameters and activations quantizers are disabled.

    2. Measure and record eval score on subset of dataset.

    3. Enable disabled quantizers in step a.

  3. Returns dictionary containing layer name and corresponding eval score.

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Dict

Returns:

layer wise eval score dictionary. dict[layer_name] = eval_score

QuantAnalyzer.export_per_layer_encoding_min_max_range(sim, results_dir)[source]

Exports encoding min and max range for all weights and activations. results_dir has html files in following format.

-results_dir

-activations.html, -weights.html

If per channel quantization(PCQ) is enabled then,

-results_dir

-activations.html, -{layer_name}_{param_name}.html

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Tuple[Dict, Dict]

Returns:

layer wise min-max range for weights and activations.

QuantAnalyzer.export_per_layer_stats_histogram(sim, results_dir)[source]

NOTE: Not to invoke when quantization scheme is not TF-Enhanced.

Exports histogram that represents a PDF of collected statistics by a quantizer. After invoking this API, results_dir should have html files in following format for every quantizers in the model.

-results_dir
-activations_pdf

name_{input/output}_{index}.html

-weights_pdf
-name

param_name_{channel_index}.html

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

QuantAnalyzer.export_per_layer_mse_loss(sim, results_dir)[source]

Exports MSE loss between fp32 and quantized output activations for each layer.

Parameters:
  • sim (QuantizationSimModel) – Quantsim model.

  • results_dir (str) – Directory to save the results.

Return type:

Dict

Returns:

layer wise MSE loss. dict[layer_name] = MSE loss.