aimet_onnx.quantsim¶
Note
It is recommended to use onnx-simplifier before creating quantsim model.
- class aimet_onnx.QuantizationSimModel(model, dummy_input=None, quant_scheme=QuantScheme.min_max, rounding_mode='nearest', default_param_bw=8, default_activation_bw=8, use_symmetric_encodings=False, use_cuda=True, device=0, config_file=None, default_data_type=QuantizationDataType.int, user_onnx_libs=None, path=None)[source]¶
Class that simulates the quantized model execution on a target hardware backend.
- Parameters:
model (
Union
[ModelProto
,ONNXModel
]) – ONNX modeldummy_input (
Optional
[Dict
[str
,ndarray
]]) – Dummy input to the model. If None, will attempt to auto-generate a dummy inputquant_scheme (
QuantScheme
) – Quantization scheme (e.g. QuantScheme.post_training_tf)rounding_mode (
str
) – Rounding mode (e.g. nearest)default_param_bw (
int
) – Quantization bitwidth for parameterdefault_activation_bw (
int
) – Quantization bitwidth for activationuse_symmetric_encodings (
bool
) – True if symmetric encoding is used. False otherwise.use_cuda (
bool
) – True if using CUDA to run quantization op. False otherwise.config_file (
Optional
[str
]) – File path or alias of the configuration file. Alias can be one of { default, htp_v66, htp_v68, htp_v69, htp_v73, htp_v75, htp_v79, htp_v81 } (Default: “default”)default_data_type (
QuantizationDataType
) – Default data type to use for quantizing all layer inputs, outputs and parameters. Possible options are QuantizationDataType.int and QuantizationDataType.float. Note that the mode default_data_type=QuantizationDataType.float is only supported with default_output_bw=16 and default_param_bw=16user_onnx_libs (
Optional
[List
[str
]]) – List of paths to all compiled ONNX custom ops librariespath (
Optional
[str
]) – Directory to save the artifacts.
- compute_encodings(forward_pass_callback, forward_pass_callback_args=<class 'aimet_onnx.quantsim._NOT_SPECIFIED'>)[source]¶
Computes encodings for all quantizers in the model.
This API will invoke forward_pass_callback, a function written by the user that runs forward pass(es) of the quantized model with a small, representative subset of the training dataset. By doing so, the quantizers in the quantized model will observe the inputs and initialize their quantization encodings according to the observed input statistics.
This function is overloaded with the following signatures:
- compute_encodings(forward_pass_callback)[source]
- Parameters:
forward_pass_callback (Callable[[ort.InferenceSession], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset
- compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]
- Parameters:
forward_pass_callback (Callable[[ort.InferenceSession, T], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset
forward_pass_callback_args (T) – The second argument to forward_pass_callback.
Example
>>> sim = QuantizationSimModel(...) >>> def run_forward_pass(session: ort.InferenceSession): ... for input in dataset: ... _ = sess.run(None, {"input": input}) ... >>> sim.compute_encodings(run_forward_pass)
- export(path, filename_prefix, export_model=True)[source]¶
Compute encodings and export to files
- Parameters:
path (
str
) – dir to save encoding filesfilename_prefix (
str
) – filename to save encoding filesexport_model (
bool
) – If True, then ONNX model is exported. When False, only encodings are exported.
Quant Scheme Enum