aimet_torch.onnx.export (beta)¶
- aimet_torch.onnx.export(model, args, f, *, export_int32_bias=True, **kwargs)[source]¶
Export
QuantizationSimModel
to onnx model with onnx QuantizeLinear and DequantizeLinear embedded in the graph.This function takes set of same arguments as torch.onnx.export()
- Parameters:
model – The model to be exported
args – Same as torch.onnx.export()
f – Same as torch.onnx.export()
export_int32_bias (bool, optional) – If true, generate and export int32 bias encoding on the fly (default: True)
**kwargs – Same as torch.onnx.export()
Note
Unlike torch.onnx.export(), this function allows up to opset 21. to support 4/16-bit quantization only available in opset 21. However, exporting to opset 21 is a beta feature and not fully stable yet. For robustness, opset 20 or lower is recommended whenever possible.
Note
Dynamo-based export (dynamo=True) is not supported yet
Examples
>>> aimet_torch.onnx.export(sim.model, x, f="model.onnx", ... input_names=["input"], output_names=["output"], ... opset_version=21, export_int32_bias=True) ... >>> import onnxruntime as ort >>> options = ort.SessionOptions() >>> options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL >>> sess = ort.InferenceSession("model.onnx", sess_options=options) >>> onnx_output, = sess.run(None, {"input": x.detach().numpy()}) >>> torch.nn.functional.cosine_similarity(torch.from_numpy(onnx_output), sim.model(x)) tensor([1.0000, 0.9999, 1.0000, ..., 1.0000, 1.0000, 1.0000], grad_fn=<AliasBackward0>)