Triton Inference Server¶
Triton Inference Server is an open source inference serving software that streamlines AI inferencing.
AIC100 SDK enables two backends for inference execution workflow. These backends, once used on a host with AIC100 cards, will detect the AIC100 cards during initialization and route the inferencing call (requested to Triton server) to the hardware.
- onnxruntime_onnx as platform with QAic EP(execution provider)
- QAic as a customized C++ backend
Creating a AIC100 backend enabled Triton docker image using AIC100 development kits¶
In order to add a customized backends (to process inferencing on AIC100 hardware) into a Vanilla Triton server image, we need to run few scripts by passing sdk_path as parameters. "docker-build.sh" script will generate a docker image as output.This script is a part of apps-sdk contents and can be run after unzipping it.
sample> cd </path/to/app-sdk>/tools/docker-build
sample> ./build_image.sh --apps-sdk </path/to/apps-sdk.zip> --platform-sdk </path/to/platform-sdk.zip> --tag <tag-name> --os triton --create-model-repo
sample> docker image ls
REPOSITORY TAG IMAGE ID CREATED SIZE
qaic-triton-x86_64-latest 1.11.0.46-triton cc217d09baea 6 days ago 34.4GB
sample> docker run -it --rm --privileged --shm-size=4g --ipc=host --net=host <triton-docker-image-name> /bin/bash
sample> docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b88d5eb98187 qaic-triton-x86_64-latest "/opt/tritonserver/n…" 2 days ago Up 2 days thirsty_beaver
Creating a model repository and configuration file¶
The model configuration file specifies the execution properties of a model. It indicates input/output structure, backend, batchsize, parameters, etc. User needs to follow Triton's model repository and model configuration rules while defining a config file.
Model configuration - onnxruntime¶
For onnxruntime configuration, platform should be set to "onnxruntime_onnx". The "use_qaic" parameter should be passed and set to true.
AIC100 specific parameters¶
Parameters are user-provided key-value pairs which Triton will pass to backend runtime environment as variables and can be used in the backend processing logic.
- config : path for configuration file containing compiler options.
- device_id : id of AIC100 device on which inference is targeted.
- use_qaic : flag to indicate to use qaic execution provider.
- hare_session : flag to enable the use of single session of runtime object across model instances.
sample example of a config.pbtxt
name: "resnet_onnx"
platform: "onnxruntime_onnx"
max_batch_size : 16
default_model_filename : "aic100/model.onnx"
input [
{
name: "data"
data_type: TYPE_FP32
dims: [3, 224, 224 ]
}
]
output [
{
name: "resnetv18_dense0_fwd"
data_type: TYPE_FP32
dims: [1000]
}
]
parameters [
{
key: "config"
value: { string_value: "1/aic100/resnet.yaml" }
},
{
key: "device_id"
value: { string_value: "0" }
},
{
key: "use_qaic"
value: { string_value: "true" }
},
{
key: "share_session"
value: { string_value: "true" }
}
]
instance_group [
{
count: 2
kind: KIND_MODEL
}
]
Model configuration - qaic backend¶
For qaic backend configuration, the "backend" parameter should be set to "qaic".
AIC100 specific parameters¶
Parameters are user-provided key-value pairs which Triton will pass to backend runtime environment as variables and can be used in processing logic of backend.
Parameters are user-provided key-value pairs which Triton will pass to backend runtime environment as variables and can be used in processing logic of backend.
- qpc_path : path for compiled binary of model.(programqpc.bin)
- device_id : id of AIC100 device on which inference is targeted. device is set 0
- set_size : size of inference queue for runtime,default is set to 20
- no_of_activations : flag to enable multiple activations of a model’s network,default is set to 1
sample example of a config.pbtxt
name: "yolov5m_qaic"
backend: "qaic"
max_batch_size : 4
default_model_filename : "aic100/model.onnx"
input [
{
name: "images"
data_type: TYPE_FP32
dims: [3, 640, 640 ]
}
]
output [
{
name: "feature_map_1"
data_type: TYPE_FP32
dims: [3, 80, 80, 85]
},
{
name: "feature_map_2"
data_type: TYPE_FP32
dims: [3, 40, 40, 85]
},
{
name: "feature_map_3"
data_type: TYPE_FP32
dims: [3, 20, 20, 85]
}
]
parameters [
{
key: "qpc_path"
value: { string_value: "/path/to/qpc" }
},
{
key: "device_id"
value: { string_value: "0" }
}
]
instance_group [
{
count: 2
kind: KIND_MODEL
}
]
Launching Triton server inside container¶
To launch Triton server, execute the tritonserver binary within Triton docker with the model repository path.
Supported Features¶
- Model Ensemble
- Dynamic Batching
- Auto device-picker
- Support for ARM64