aimet_onnx.quantsim¶
Note
It is recommended to use onnx-simplifier before creating quantsim model.
- class aimet_onnx.QuantizationSimModel(model, *, param_type=int8, activation_type=int8, quant_scheme=QuantScheme.min_max, config_file=None, dummy_input=None, user_onnx_libs=None, providers=None, path=None)[source]¶
- Class that simulates the quantized model execution on a target hardware backend. - Parameters:
- model (onnx.ModelProto) – ONNX ModelProto to quantize 
- param_type (qtype | str) – quantized type to use for parameter tensors. Can be { int4, int8, int16, float16, float32 } or - aimet_onnx.qtype
- activation_type (qtype | str) – quantized type to use for activation tensors. Can be { int4, int8, int16, float16, float32 } or - aimet_onnx.qtype
- quant_scheme (QuantScheme | str) – Quantization scheme to use for calibration. Can be { tf_enhanced, min_max } or - QuantScheme
- config_file (str, optional) – File path or alias of the configuration file. Alias can be one of { default, htp_v66, htp_v68, htp_v69, htp_v73, htp_v75, htp_v79, htp_v81 } (Default: “default”) 
- dummy_input (Dict[str, np.ndarray], optional) – Sample input to the model. Only needed for non shape-inferable models with parameterized shapes 
- user_onnx_libs (List[str], optional) – List of paths to all compiled ONNX custom ops libraries 
- providers (List, optional) – Onnxruntime execution providers to use when building InferenceSession. If None, default provider is “CPUExecutionProvider” 
- path (str, optional) – Directory to save temporary artifacts. 
 
 - compute_encodings(*args, **kwargs)[source]¶
- Computes encodings for all quantizers in the model. - This API will invoke forward_pass_callback, a function written by the user that runs forward pass(es) of the quantized model with a small, representative subset of the training dataset. By doing so, the quantizers in the quantized model will observe the inputs and initialize their quantization encodings according to the observed input statistics. - This function is overloaded with the following signatures: - compute_encodings(inputs)[source]
- Parameters:
- inputs (Iterable[Dict[str, np.ndarray]]) – The set of model input samples to use during calibration 
 
 - compute_encodings(forward_pass_callback)[source]
- Parameters:
- forward_pass_callback (Callable[[ort.InferenceSession], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset 
 
 - compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]
- Parameters:
- forward_pass_callback (Callable[[ort.InferenceSession, T], Any]) – A function that takes a quantized model and runs forward passes with a small, representative subset of training dataset 
- forward_pass_callback_args (T) – The second argument to forward_pass_callback. 
 
 
 - Example - >>> sim = QuantizationSimModel(...) >>> def run_forward_pass(session: ort.InferenceSession): ... for input in dataset: ... _ = sess.run(None, {"input": input}) ... >>> sim.compute_encodings(run_forward_pass) 
 - export(path, filename_prefix, export_model=True)[source]¶
- Compute encodings and export to files - Parameters:
- path ( - str) – dir to save encoding files
- filename_prefix ( - str) – filename to save encoding files
- export_model ( - bool) – If True, then ONNX model is exported. When False, only encodings are exported.
 
 
 - to_onnx_qdq(*, prequantize_constants=False)[source]¶
- Return a copy of ModelProto with all QcQuantizeOp nodes replaced with QuantizeLinear and/or DequantizeLinear. - Example: :rtype: - ModelProto- >>> len([qc_op for qc_op in sim.model.nodes() if dq.op_type == "QcQuantizeOp"]) 10 >>> onnx_qdq = sim.to_onnx_qdq() >>> len([qc_op for qc_op in sim.model.nodes() if dq.op_type == "QcQuantizeOp"]) 0 >>> len([dq for dq in onnx_qdq.graph.node if dq.op_type == "DequantizeLinear"]) 10 - Parameters:
- prequantize_constants (bool) – If True, output model will contain quantized values for constant tensors. If False, the model will contain floating point data and Q -> DQ nodes. 
 
 
- aimet_onnx.compute_encodings(sim)[source]¶
- Computes encodings for all quantizers in the model. - Under this context manager, - QuantizationSimModelwill observe all inputs that run through the model to calibrate the quantization encoding of each quantizer.- Example - >>> sim = QuantizationSimModel(...) >>> with compute_encodings(sim): ... for input in dataset: ... _ = sim.session.run(None, {"input": input}) 
Quant Scheme Enum