ONNX Runtime Backend

The onnxruntime_onnx (v1.18.1) backend uses the QAic EP (execution provider) for deploying ONNX graphs on Cloud AI inference accelerators.

ONNX Runtime Model Respository

For onnxruntime configuration, platform should be set to onnxruntime_onnx. The use_qaic parameter must be set to true.

Cloud AI Parameters

Parameters are user-provided key-value pairs which Triton will pass to backend runtime environment as variables and can be used in the backend processing logic.

  • config : path for configuration file containing compiler options.

  • device_id : id of Cloud AI device on which inference is targeted. (not mandatory as the server auto picks the available device)

  • use_qaic : flag to indicate to use QAic execution provider.

  • share_session : flag to enable the use of a single runtime session object across model instances.

Sample config.pbtxt

name: "resnet_onnx"
platform: "onnxruntime_onnx"
max_batch_size : 16
default_model_filename : "aic100/model.onnx"
input [
  {
    name: "data"
    data_type: TYPE_FP32
    dims: [3, 224, 224 ]
  }
]
output [
  {
    name: "resnetv18_dense0_fwd"
    data_type: TYPE_FP32
    dims: [1000]
  }
]
parameters [
  {
    key: "config"
    value: { string_value: "1/aic100/resnet.yaml" }
  },
  {
    key: "device_id"
    value: { string_value: "0" }
  },
  {
    key: "use_qaic"
    value: { string_value: "true" }
  },
  {
    key: "share_session"
    value: { string_value: "true" }
  }
]
instance_group [
  {
    count: 2
    kind: KIND_MODEL
  }
]

Launch Triton Server

Launch Triton server within the Triton container with the model repository path.

/opt/tritonserver/bin/tritonserver --model-repository=</path/to/repository>