aimet_onnx.quantsim

Note

It is recommended to use onnx-simplifier before creating quantsim model.

class aimet_onnx.QuantizationSimModel(model, dummy_input=None, quant_scheme=QuantScheme.min_max, rounding_mode='nearest', default_param_bw=8, default_activation_bw=8, use_symmetric_encodings=False, use_cuda=True, device=0, config_file=None, default_data_type=QuantizationDataType.int, user_onnx_libs=None, path=None)[source]

Class that simulates the quantized model execution on a target hardware backend.

Parameters:
  • model (Union[ModelProto, ONNXModel]) – ONNX model

  • dummy_input (Optional[Dict[str, ndarray]]) – Dummy input to the model. If None, will attempt to auto-generate a dummy input

  • quant_scheme (QuantScheme) – Quantization scheme (e.g. QuantScheme.post_training_tf)

  • rounding_mode (str) – Rounding mode (e.g. nearest)

  • default_param_bw (int) – Quantization bitwidth for parameter

  • default_activation_bw (int) – Quantization bitwidth for activation

  • use_symmetric_encodings (bool) – True if symmetric encoding is used. False otherwise.

  • use_cuda (bool) – True if using CUDA to run quantization op. False otherwise.

  • config_file (Optional[str]) – File path or alias of the configuration file. Alias can be one of { default, htp_v66, htp_v68, htp_v69, htp_v73, htp_v75, htp_v79, htp_v81 } (Default: “default”)

  • default_data_type (QuantizationDataType) – Default data type to use for quantizing all layer inputs, outputs and parameters. Possible options are QuantizationDataType.int and QuantizationDataType.float. Note that the mode default_data_type=QuantizationDataType.float is only supported with default_output_bw=16 and default_param_bw=16

  • user_onnx_libs (Optional[List[str]]) – List of paths to all compiled ONNX custom ops libraries

  • path (Optional[str]) – Directory to save the artifacts.

compute_encodings(forward_pass_callback, forward_pass_callback_args=<class 'aimet_onnx.quantsim._NOT_SPECIFIED'>)[source]

Computes encodings for all quantizers in the model.

This API will invoke forward_pass_callback, a function written by the user that runs forward pass(es) of the quantized model with a small, representative subset of the training dataset. By doing so, the quantizers in the quantized model will observe the inputs and initialize their quantization encodings according to the observed input statistics.

This function is overloaded with the following signatures:

compute_encodings(forward_pass_callback)[source]
Parameters:

forward_pass_callback (Callable[[ort.InferenceSession], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset

compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]
Parameters:
  • forward_pass_callback (Callable[[ort.InferenceSession, T], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset

  • forward_pass_callback_args (T) – The second argument to forward_pass_callback.

Example

>>> sim = QuantizationSimModel(...)
>>> def run_forward_pass(session: ort.InferenceSession):
...     for input in dataset:
...         _ = sess.run(None, {"input": input})
...
>>> sim.compute_encodings(run_forward_pass)
export(path, filename_prefix, export_model=True)[source]

Compute encodings and export to files

Parameters:
  • path (str) – dir to save encoding files

  • filename_prefix (str) – filename to save encoding files

  • export_model (bool) – If True, then ONNX model is exported. When False, only encodings are exported.

Quant Scheme Enum

class aimet_common.defs.QuantScheme(value)[source]

Quantization schemes

classmethod from_str(alias)[source]

Returns QuantScheme object from string alias

Return type:

QuantScheme