aimet_onnx.quantsim¶
Note
It is recommended to use onnx-simplifier before creating quantsim model.
- class aimet_onnx.QuantizationSimModel(model, *, param_type=int8, activation_type=int8, quant_scheme=QuantScheme.min_max, config_file=None, dummy_input=None, user_onnx_libs=None, providers=None, path=None)[source]¶
Class that simulates the quantized model execution on a target hardware backend.
- Parameters:
model (onnx.ModelProto) – ONNX ModelProto to quantize
param_type (qtype | str) – quantized type to use for parameter tensors. Can be { int4, int8, int16, float16, float32 } or
aimet_onnx.qtype
activation_type (qtype | str) – quantized type to use for activation tensors. Can be { int4, int8, int16, float16, float32 } or
aimet_onnx.qtype
quant_scheme (QuantScheme | str) – Quantization scheme to use for calibration. Can be { tf_enhanced, min_max } or
QuantScheme
config_file (str, optional) – File path or alias of the configuration file. Alias can be one of { default, htp_v66, htp_v68, htp_v69, htp_v73, htp_v75, htp_v79, htp_v81 } (Default: “default”)
dummy_input (Dict[str, np.ndarray], optional) – Sample input to the model. Only needed for non shape-inferable models with parameterized shapes
user_onnx_libs (List[str], optional) – List of paths to all compiled ONNX custom ops libraries
providers (List, optional) – Onnxruntime execution providers to use when building InferenceSession. If None, default provider is “CPUExecutionProvider”
path (str, optional) – Directory to save temporary artifacts.
- compute_encodings(*args, **kwargs)[source]¶
Computes encodings for all quantizers in the model.
This API will invoke forward_pass_callback, a function written by the user that runs forward pass(es) of the quantized model with a small, representative subset of the training dataset. By doing so, the quantizers in the quantized model will observe the inputs and initialize their quantization encodings according to the observed input statistics.
This function is overloaded with the following signatures:
- compute_encodings(inputs)[source]
- Parameters:
inputs (Iterable[Dict[str, np.ndarray]]) – The set of model input samples to use during calibration
- compute_encodings(forward_pass_callback)[source]
- Parameters:
forward_pass_callback (Callable[[ort.InferenceSession], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset
- compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]
- Parameters:
forward_pass_callback (Callable[[ort.InferenceSession, T], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset
forward_pass_callback_args (T) – The second argument to forward_pass_callback.
Example
>>> sim = QuantizationSimModel(...) >>> def run_forward_pass(session: ort.InferenceSession): ... for input in dataset: ... _ = sess.run(None, {"input": input}) ... >>> sim.compute_encodings(run_forward_pass)
- export(path, filename_prefix, export_model=True)[source]¶
Compute encodings and export to files
- Parameters:
path (
str
) – dir to save encoding filesfilename_prefix (
str
) – filename to save encoding filesexport_model (
bool
) – If True, then ONNX model is exported. When False, only encodings are exported.
- aimet_onnx.compute_encodings(sim)[source]¶
Computes encodings for all quantizers in the model.
Under this context manager,
QuantizationSimModel
will observe all inputs that run through the model to calibrate the quantization encoding of each quantizer.Example
>>> sim = QuantizationSimModel(...) >>> with compute_encodings(sim): ... for input in dataset: ... _ = sim.session.run(None, {"input": input})
Quant Scheme Enum