aimet_torch.onnx.export

aimet_torch.onnx.export(model, args, f, *, export_int32_bias=True, prequantize_constants=False, **kwargs)[source]

Export QuantizationSimModel to onnx model with onnx QuantizeLinear and DequantizeLinear embedded in the graph.

This function takes the same set of arguments as torch.onnx.export()

Parameters:
  • model – The model to be exported

  • args – Same as torch.onnx.export()

  • f – Same as torch.onnx.export()

  • export_int32_bias (bool, optional) – If true, generate and export int32 bias encoding on the fly (default: True)

  • **kwargs – Same as torch.onnx.export()

Note

For robustness, onnx >=1.19 is highly recommended with this API, especially when exporting large models (>2GB). This is due to a known bug in onnx <1.19 version converter. For more information, see https://github.com/onnx/onnx/issues/6529

Note

Dynamo-based export (dynamo=True) is not supported yet

Examples

>>> aimet_torch.onnx.export(sim.model, x, f="model.onnx",
...                         input_names=["input"], output_names=["output"],
...                         opset_version=21, dynamo=False,
...                         export_int32_bias=True)
...
>>> import onnxruntime as ort
>>> options = ort.SessionOptions()
>>> options.graph_optimization_level = ort.GraphOptimizationLevel.ORT_DISABLE_ALL
>>> sess = ort.InferenceSession("model.onnx", sess_options=options)
>>> onnx_output, = sess.run(None, {"input": x.detach().numpy()})
>>> torch.nn.functional.cosine_similarity(torch.from_numpy(onnx_output), sim.model(x))
tensor([1.0000, 0.9999, 1.0000,  ..., 1.0000, 1.0000, 1.0000],
       grad_fn=<AliasBackward0>)