aimet_torch.v1.quantsim

Note

This module is also available in the default aimet_torch namespace with the same top-level API.

class aimet_torch.v1.quantsim.QuantizationSimModel(model, dummy_input, quant_scheme=QuantScheme.post_training_tf_enhanced, rounding_mode='nearest', default_output_bw=8, default_param_bw=8, in_place=False, config_file=None, default_data_type=QuantizationDataType.int)[source]

Implements mechanism to add quantization simulations ops to a model. This allows for off-target simulation of inference accuracy. Also allows the model to be fine-tuned to counter the effects of quantization.

Constructor for QuantizationSimModel.

Parameters:
  • model (Module) – Model to add simulation ops to

  • dummy_input (Union[Tensor, Tuple]) – Dummy input to the model. Used to parse model graph. If the model has more than one input, pass a tuple. User is expected to place the tensors on the appropriate device.

  • quant_scheme (Union[str, QuantScheme]) – Quantization scheme. The Quantization scheme is used to compute the Quantization encodings. There are multiple schemes available. Please refer the QuantScheme enum definition.

  • rounding_mode (str) – Rounding mode. Supported options are ‘nearest’ or ‘stochastic’

  • default_output_bw (int) – Default bitwidth (4-31) to use for quantizing all layer inputs and outputs unless otherwise specified in the config file.

  • default_param_bw (int) – Default bitwidth (4-31) to use for quantizing all layer parameters unless otherwise specified in the config file.

  • in_place (bool) – If True, then the given ‘model’ is modified in-place to add quant-sim nodes. Only suggested use of this option is when the user wants to avoid creating a copy of the model

  • config_file (Optional[str]) – Path to Configuration file for model quantizers

  • default_data_type (QuantizationDataType) – Default data type to use for quantizing all inputs, outputs and parameters. unless otherwise specified in the config file. Possible options are QuantizationDataType.int and QuantizationDataType.float. Note that the mode default_data_type=QuantizationDataType.float is only supported with default_output_bw=16 or 32 and default_param_bw=16 or 32.

The following API can be used to Compute encodings for calibration:

QuantizationSimModel.compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]

Computes encodings for all quantization sim nodes in the model. It is also used to find initial encodings for Range Learning

Parameters:
  • forward_pass_callback – A callback function that simply runs forward passes on the model. This callback function should use representative data for the forward pass, so the calculated encodings work for all data samples. This callback internally chooses the number of data samples it wants to use for calculating encodings.

  • forward_pass_callback_args – These argument(s) are passed to the forward_pass_callback as-is. Up to the user to determine the type of this parameter. E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of parameters or an object representing something more complex. If set to None, forward_pass_callback will be invoked with no parameters.

Returns:

None

The following APIs can be used to save and restore the quantized model

quantsim.save_checkpoint(file_path)

This API provides a way for the user to save a checkpoint of the quantized model which can be loaded at a later point to continue fine-tuning e.g. See also load_checkpoint()

Parameters:
  • quant_sim_model (_QuantizationSimModelInterface) – QuantizationSimModel to save checkpoint for

  • file_path (str) – Path to the file where you want to save the checkpoint

Returns:

None

quantsim.load_checkpoint()

Load the quantized model

Parameters:

file_path (str) – Path to the file where you want to save the checkpoint

Return type:

_QuantizationSimModelInterface

Returns:

A new instance of the QuantizationSimModel created after loading the checkpoint

The following API can be used to export the quantized model to target:

QuantizationSimModel.export(path, filename_prefix, dummy_input, onnx_export_args=None, propagate_encodings=False, export_to_torchscript=False, use_embedded_encodings=False, export_model=True, filename_prefix_encodings=None)

This method exports out the quant-sim model so it is ready to be run on-target.

Specifically, the following are saved:

  1. The sim-model is exported to a regular PyTorch model without any simulation ops

  2. The quantization encodings are exported to a separate JSON-formatted file that can then be imported by the on-target runtime (if desired)

  3. Optionally, An equivalent model in ONNX format is exported. In addition, nodes in the ONNX model are named the same as the corresponding PyTorch module names. This helps with matching ONNX node to their quant encoding from #2.

Parameters:
  • path (str) – path where to store model pth and encodings

  • filename_prefix (str) – Prefix to use for filenames of the model pth and encodings files

  • dummy_input (Union[Tensor, Tuple]) – Dummy input to the model. Used to parse model graph. It is required for the dummy_input to be placed on CPU.

  • onnx_export_args (Union[OnnxExportApiArgs, Dict, None]) – Optional export argument with onnx specific overrides provided as a dictionary or OnnxExportApiArgs object. If not provided, defaults to “opset_version” = None, “input_names” = None, “output_names” = None, and for torch version < 1.10.0, “enable_onnx_checker” = False.

  • propagate_encodings (bool) – If True, encoding entries for intermediate ops (when one PyTorch ops results in multiple ONNX nodes) are filled with the same BW and data_type as the output tensor for that series of ops. Defaults to False.

  • export_to_torchscript (bool) – If True, export to torchscript. Export to onnx otherwise. Defaults to False.

  • use_embedded_encodings (bool) – If True, another onnx model embedded with fakequant nodes will be exported

  • export_model (bool) – If True, then ONNX model is exported. When False, only encodings are exported. User should disable (False) this flag only if the corresponding ONNX model already exists in the path specified

  • filename_prefix_encodings (Optional[str]) – File name prefix to be used when saving encodings. If None, then user defaults to filename_prefix value

Quant Scheme Enum

class aimet_common.defs.QuantScheme(value)[source]

Enumeration of Quant schemes

post_training_percentile = 6

For a Tensor, adjusted minimum and maximum values are selected based on the percentile value passed. The Quantization encodings are calculated using the adjusted minimum and maximum value.

post_training_tf = 1

For a Tensor, the absolute minimum and maximum value of the Tensor are used to compute the Quantization encodings.

post_training_tf_enhanced = 2

For a Tensor, searches and selects the optimal minimum and maximum value that minimizes the Quantization Noise. The Quantization encodings are calculated using the selected minimum and maximum value.

training_range_learning_with_tf_enhanced_init = 4

For a Tensor, the encoding values are initialized with the post_training_tf_enhanced scheme. Then, the encodings are learned during training.

training_range_learning_with_tf_init = 3

For a Tensor, the encoding values are initialized with the post_training_tf scheme. Then, the encodings are learned during training.