aimet_onnx.quantsim

Note

It is recommended to use onnx-simplifier before creating quantsim model.

class aimet_onnx.quantsim.QuantizationSimModel(model, dummy_input=None, quant_scheme=QuantScheme.post_training_tf_enhanced, rounding_mode='nearest', default_param_bw=8, default_activation_bw=8, use_symmetric_encodings=False, use_cuda=True, device=0, config_file=None, default_data_type=QuantizationDataType.int, user_onnx_libs=None, path=None)[source]

Creates a QuantizationSimModel model by adding quantization simulations ops to a given model

Constructor

Parameters:
  • model (ModelProto) – ONNX model

  • dummy_input (Optional[Dict[str, ndarray]]) – Dummy input to the model. If None, will attempt to auto-generate a dummy input

  • quant_scheme (QuantScheme) – Quantization scheme (e.g. QuantScheme.post_training_tf)

  • rounding_mode (str) – Rounding mode (e.g. nearest)

  • default_param_bw (int) – Quantization bitwidth for parameter

  • default_activation_bw (int) – Quantization bitwidth for activation

  • use_symmetric_encodings (bool) – True if symmetric encoding is used. False otherwise.

  • use_cuda (bool) – True if using CUDA to run quantization op. False otherwise.

  • config_file (Optional[str]) – Path to Configuration file for model quantizers

  • default_data_type (QuantizationDataType) – Default data type to use for quantizing all layer inputs, outputs and parameters. Possible options are QuantizationDataType.int and QuantizationDataType.float. Note that the mode default_data_type=QuantizationDataType.float is only supported with default_output_bw=16 and default_param_bw=16

  • user_onnx_libs (Optional[List[str]]) – List of paths to all compiled ONNX custom ops libraries

  • path (Optional[str]) – Directory to save the artifacts.

The following API can be used to compute encodings for calibration.

QuantizationSimModel.compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]

Compute and return the encodings of each tensor quantizer

Parameters:
  • forward_pass_callback – A callback function that simply runs forward passes on the model. This callback function should use representative data for the forward pass, so the calculated encodings work for all data samples. This callback internally chooses the number of data samples it wants to use for calculating encodings.

  • forward_pass_callback_args – These argument(s) are passed to the forward_pass_callback as-is. Up to the user to determine the type of this parameter. E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of parameters or an object representing something more complex. If set to None, forward_pass_callback will be invoked with no parameters.

The following API can be used to export the quantized model to target.

QuantizationSimModel.export(path, filename_prefix)[source]

Compute encodings and export to files

Parameters:
  • path (str) – dir to save encoding files

  • filename_prefix (str) – filename to save encoding files

Enum Definition

Quant Scheme Enum

class aimet_common.defs.QuantScheme(value)[source]

Enumeration of Quant schemes

post_training_percentile = 6

For a Tensor, adjusted minimum and maximum values are selected based on the percentile value passed. The Quantization encodings are calculated using the adjusted minimum and maximum value.

post_training_tf = 1

For a Tensor, the absolute minimum and maximum value of the Tensor are used to compute the Quantization encodings.

post_training_tf_enhanced = 2

For a Tensor, searches and selects the optimal minimum and maximum value that minimizes the Quantization Noise. The Quantization encodings are calculated using the selected minimum and maximum value.

training_range_learning_with_tf_enhanced_init = 4

For a Tensor, the encoding values are initialized with the post_training_tf_enhanced scheme. Then, the encodings are learned during training.

training_range_learning_with_tf_init = 3

For a Tensor, the encoding values are initialized with the post_training_tf scheme. Then, the encodings are learned during training.