aimet_onnx.quantsim¶

Note

It is recommended to use onnx-simplifier before creating quantsim model.

class aimet_onnx.QuantizationSimModel(model, *, param_type=int8, activation_type=int8, quant_scheme=QuantScheme.min_max, config_file=None, dummy_input=None, user_onnx_libs=None, providers=None, path=None)[source]¶

Class that simulates the quantized model execution on a target hardware backend.

Parameters:

model (onnx.ModelProto) – ONNX ModelProto to quantize
param_type (qtype | str) – quantized type to use for parameter tensors. Can be { int4, int8, int16, float16, float32 } or aimet_onnx.qtype
activation_type (qtype | str) – quantized type to use for activation tensors. Can be { int4, int8, int16, float16, float32 } or aimet_onnx.qtype
quant_scheme (QuantScheme | str) – Quantization scheme to use for calibration. Can be { tf_enhanced, min_max } or QuantScheme
config_file (str, optional) – File path or alias of the configuration file. Alias can be one of { default, htp_v66, htp_v68, htp_v69, htp_v73, htp_v75, htp_v79, htp_v81 } (Default: “default”)
dummy_input (Dict[str, np.ndarray], optional) – Sample input to the model. Only needed for non shape-inferable models with parameterized shapes
user_onnx_libs (List[str], optional) – List of paths to all compiled ONNX custom ops libraries
providers (List, optional) – Onnxruntime execution providers to use when building InferenceSession. If None, default provider is “CPUExecutionProvider”
path (str, optional) – Directory to save temporary artifacts.

compute_encodings(*args, **kwargs)[source]¶

Computes encodings for all quantizers in the model.

This API will invoke forward_pass_callback, a function written by the user that runs forward pass(es) of the quantized model with a small, representative subset of the training dataset. By doing so, the quantizers in the quantized model will observe the inputs and initialize their quantization encodings according to the observed input statistics.

This function is overloaded with the following signatures:

compute_encodings(inputs)[source]

Parameters:: inputs (Iterable[Dict[str, np.ndarray]]) – The set of model input samples to use during calibration

compute_encodings(forward_pass_callback)[source]

Parameters:: forward_pass_callback (Callable[[ort.InferenceSession], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset

compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]

Parameters:

forward_pass_callback (Callable[[ort.InferenceSession, T], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset
forward_pass_callback_args (T) – The second argument to forward_pass_callback.

Example

>>> sim = QuantizationSimModel(...)
>>> def run_forward_pass(session: ort.InferenceSession):
...     for input in dataset:
...         _ = sess.run(None, {"input": input})
...
>>> sim.compute_encodings(run_forward_pass)

export(path, filename_prefix, export_model=True)[source]¶

Compute encodings and export to files

Parameters:

path (str) – dir to save encoding files
filename_prefix (str) – filename to save encoding files
export_model (bool) – If True, then ONNX model is exported. When False, only encodings are exported.

to_onnx_qdq()[source]¶

Return a copy of ModelProto with all QcQuantizeOp nodes replaced with QuantizeLinear and/or DequantizeLinear.

Example: :rtype: ModelProto

>>> len([qc_op for qc_op in sim.model.nodes() if dq.op_type == "QcQuantizeOp"])
10
>>> onnx_qdq = sim.to_onnx_qdq()
>>> len([qc_op for qc_op in sim.model.nodes() if dq.op_type == "QcQuantizeOp"])
0
>>> len([dq for dq in onnx_qdq.graph.node if dq.op_type == "DequantizeLinear"])
10

aimet_onnx.compute_encodings(sim)[source]¶

Computes encodings for all quantizers in the model.

Under this context manager, QuantizationSimModel will observe all inputs that run through the model to calibrate the quantization encoding of each quantizer.

Example

>>> sim = QuantizationSimModel(...)
>>> with compute_encodings(sim):
...     for input in dataset:
...         _ = sim.session.run(None, {"input": input})

Quant Scheme Enum

class aimet_common.defs.QuantScheme(value)[source]¶

Quantization schemes

classmethod from_str(alias)[source]¶

Returns QuantScheme object from string alias

Return type:: QuantScheme