QAIC execution provider¶

QAIC Execution Provider for ONNX Runtime enables hardware accelerated execution on Qualcomm AIC100 chipset. It leverages AIC compiler and runtime API packaged in apps, platform SDKs.

Setup¶

Create a json specification file for onnxruntime as shown below

{
"base_image": "ubuntu20",
"applications": ["onnxruntime"],
"python_version": "py38",
"sdk": {
    "qaic_apps": "/path/to/apps/sdk.zip",
    "qaic_platform": "/path/to/platform/sdk.zip"
}

}

Launch a docker container for an image built with this specification, following instructions in Docker
In docker container, onnxruntime_qaic build (onnxruntime version 1.13.1 with qaic EP integration) will be available at /opt/qti-aic/integrations/qaic_onnxrt/onnxruntime_qaic.

Configuration Options¶

option	type	description
config	str	[Required] Path to the model-settings YAML file. Contains AIC configuration parameters used by QAic execution provider of ONNX Runtime. The configuration for best performance and accuracy can be generated using model configurator tool.
aic_device_id	int	[Optional] AIC device ID, auto-picked when not configured

Parameters supported in model settings yaml file¶

Option	Description	Default	Relevance
	Runtime parameters
aic-binary-dir	Absolute path or relative path ( wrt model settings file parent directory) to dir with programqpc.bin	“”	Required to skip compilation.
device-id	AIC device ID	0	Optional
set-size	Set Size for inference loop execution	10	Optional
aic-num-of-activations	Number of activations	1	Optional
	qaicRegisterCustomOp - Compiler C API
register-custom-op	Register custom op using this configuration file		Required if model has AIC custom ops; vector of string
	Graph Config - Compiler API
aic-depth-first-mem	Sets DFS memory size	Set by compiler	Optional. Used in compilation with aic-enable-depth-first
aic-enable-depth-first	Enables DFS with default memory size; “True”, “False”	Set by compiler	Optional. Used in compilation.
aic-num-cores	Number of aic cores to be used for inference on	1	Optional. Used in compilation.
allocator-dealloc-delay	Option to increase buffer lifetime 0 - 10, e.g 1	Set by compiler	Optional. Used in compilation.
batchsize	Sets the number of batches to be used for execution	1	Optional. Used in compilation.
convert-to-fp16	Run all floating-point in fp16; “True”, “False”	“False”	Optional. Used in compilation.
enable-channelwise	Enable channelwise quantization of Convolution op; “True”, “False”	Set by compiler	Optional. Used in compilation with pgq-profile.
enable-rowwise	Enable rowwise quantization of FullyConnected and SparseLengthsSum ops; “True”, “False”	Set by compiler	Optional. Used in compilation with pgq-profile.
execute-nodes-in-fp16	Run all insances of the operators in this list with FP16; “True”, “False”	Set by compiler	Optional. Used in compilation with pgq-profile for mixed precision.
hwVersion	HW version of AI	QAIC_HW_V2_0	Cannot be configured, set to QAIC_HW_V2_0.
keep-original-precision-for-nodes	Run operators in this list with original precision at generation		Optional. Used in compilation with pgq-profile for mixed precision.
mos	Effort level to reduce the on-chip memory; eg: “1”	Set by compiler	Optional. Used in compilation.
multicast-weights	Reduce DDR bandwidth by loading weights used on multiple-cores only once and multicasting to other cores
ols	Factor to increasing splitting of network for parallelism	Set by compiler	Optional. Used in compilation.
quantization-calibration	Specify quantization calibration -“None”, “KLMinimization”, “Percentile”, “MSE”, “SQNR”, “KLMinimizationV2”	“None”	Optional. Used in compilation with pgq-profile.
quantization-schema-activations	Specify quantization schema - “asymmetric”, “symmetric”, “symmetric_with_uint8”, “symmetric_with_power2_scale”	“symmetric_with_uint8”	Optional. Used in compilation with pgq-profile.
quantization-schema-constants	Specify quantization schema - “asymmetric”, “symmetric”, “symmetric_with_uint8”, “symmetric_with_power2_scale”	“symmetric_with_uint8”	Optional. Used in compilation with pgq-profile.
size-split-granularity	To set max tile size, KiB between 512 - 2048, e.g 1024	Set by compiler	Optional. Used in compilation.
aic-hw	To set the target to QAIC_SIM or QAIC_HW; “True”, “False”	“True”	Optional.
	Model Params - Compiler API
model-path	Path to model file		Required. Used in compilation, OnnxRT framework.
onnx-define-symbol	Define an onnx symbol with its value. pairs of onnx symbol key,value separated by space.		Required. Used in compilation, OnnxRT framework.
external-quantization	Path to load the externally generated quantization profile		Optional
node-precision-info	Path to load model loader precision file for setting node instances to FP16 or FP32		Optional. Used in compilation with pgq-profile for mixed precision.
	Common
relative-path	aic-binary-dir absolute path will be constructed using base-path of model-settings file; “True”, “False”	“False”	Optional. Set to true, to allow relative-path for aic-binary-dir.

Usage¶

Python¶

Here are few basic commands you can use with ONNX Runtime and QAIC.

Load a model¶

import onnxruntime as ort
provider_options = []
qaic_provider_options = {}
qaic_provider_options['config'] = '/path/to/yaml/file'
qaic_provider_options['device_id'] = aic_device_id
provider_options.append(qaic_provider_options)
session=onnxruntime.InferenceSession('/path/to/onnx/model', sess_options,
                                           providers = providers, provider_options = providers_options)

This will bind your model to AIC100 chip, with qaic execution provider.

Perform Inference¶

# Perform inference using OnnxRuntime
results = sess.run(None, {'input_name': input_data})

In the above code replace 'input_name' with name of model input node and input_data with the actual input data.

C++¶

Load a Model¶

#include <onnxruntime_cxx_api.h>
#include <qaic_provider_factory.h>


// Set environment as required
Ort::Env env(ORT_LOGGING_LEVEL_ERROR, "test");
// Initialize session options, create session
Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(1);
session_options.SetGraphOptimizationLevel(
    GraphOptimizationLevel::0);
auto s = OrtSessionOptionsAppendExecutionProvider_QAic(
        session_options, "/path/to/yaml/file", aic_device_id);

Ort::Session session(env, "/path/to/onnx/model", session_options);

Perform inference¶

// Run the model

auto output_tensors = session.Run(Ort::RunOptions{nullptr},
input_names.data(), &input_tensor, 1, output_names.data(), 1);

In the above code, replace "/path/to/onnx/model/" to the path for your onnx file. Also ensure data and shape of your input tensor match the requirements of your model.

End-to-end examples¶

Install additional packages in the container for running End-to-end examples

apt-get update
apt-get install -y python-yaml libpng-dev

pip3 install --upgrade pip
pip3 install opencv-python pyyaml scipy

End to end examples (cpp and python) for resnet50 are available at - /opt/qti-aic/integrations/qaic_onnxrt/tests/.

Running the ResNet C++ sample¶

Compile the Sample Resnet C++ test using build_tests.sh script. By default, test is built using libs from onnxruntime_qaic release build. To enable debugging, re-build onnxruntime_qaic project in Debug configuration and run ./build_test.sh with debug flag.

build_tests.sh [--release|--debug]

Run the executable. The commands below set the environment and run the ResNet-50 model with the provided image on QAic or CPU backend. The program outputs the most probable prediction class index for each iteration.

cd build/release
./qaic-onnxrt-resnet50-test -i <path/to/input/png/image>
                            -m  ../../resnet50/resnet50.yaml

Test options

Option	Description
`-m, --model-config`	[Required] Path to the model-setting yaml file
`-i, --input-path`	[Required] Path to the input PNG image file
`-b, --backend`	[Optional] Default=’qaic’, Specify qaic/cpu as backend
`-d, --device-id`	[Optional] Default=0 Specify qaic device ID
`-n, --num-iter`	[Optional]

Running the ResNet Python sample¶

Run test_resnet.py at /opt/qti-aic/integrations/qaic_onnxrt/tests/resnet50
python test_resnet.py --model_config ./resnet50/resnet50.yaml
                      --input_file </path/to/png/image>

Test options

Option	Description
`--model_config`	[Required] Path to the model-setting yaml file
`--input_file`	[Required] Path to the input PNG image file
`--backend`	[Optional] Default=’qaic’, Specify qaic/cpu as backend
`--device_id`	[Optional] Default=0 Specify qaic device ID
`--num_iter`	[Optional]

Running models with generic QAic EP test¶

test_qaic_ep.py is a generic test runner for compilation, execution on AIC100.

Run test_qaic_ep.py at /opt/qti-aic/integrations/qaic_onnxrt/tests/ -

python test_qaic_ep.py --model_config ./resnet50/resnet50.yaml
                           --input_file_list </path/to/input/list>

Test options

Option	Description
`--model_config`	[Required] Path to the model-setting yaml file
`--input_file_list`	[Required] Path of the file (.txt) containing list of batched inputs in .raw format
`--backend`	[Optional] Default=’qaic’, Specify qaic/cpu as backend
`--device_id`	[Optional] Default=0 Specify qaic device ID
`--num_iter`	[Optional]
`--max_threads`	[Optional] Default=1000 Maximum no. of threads to run inferences
`--log_level`	[Optional]

Execution through Onnxruntime test framework¶

QAic EP is enabled for execution with onnx_test_runner, onnxruntime_perf_test.
For model directory requirements and comprehensive list of options supported, refer to onnxruntime perf test documentation.
onnx_test_runner documentation.
Sample testdata can be downloaded here.

cd /opt/qti-aic/integrations/qaic_onnxrt/onnxruntime_qaic/build/Release

./onnxruntime_perf_test -e qaic -i 'config|/path/to/resnet50.yaml aic_device_id|0' -m times -r 1000 /path/to/model.onnx

./onnx_test_runner -e qaic /path/to/model/dir