QAIC execution provider

QAIC Execution Provider for ONNX Runtime enables hardware accelerated execution on Qualcomm AIC100 chipset. It leverages AIC compiler and runtime API packaged in apps, platform SDKs.

Setup

  • Create a json specification file for onnxruntime as shown below

    {
    "base_image": "ubuntu20",
    "applications": ["onnxruntime"],
    "python_version": "py38",
    "sdk": {
        "qaic_apps": "/path/to/apps/sdk.zip",
        "qaic_platform": "/path/to/platform/sdk.zip"
    }
    

    }

  • Launch a docker container for an image built with this specification, following instructions in Docker

  • In docker container, onnxruntime_qaic build (onnxruntime version 1.13.1 with qaic EP integration) will be available at /opt/qti-aic/integrations/qaic_onnxrt/onnxruntime_qaic.

Configuration Options

option

type

description

config

str

[Required] Path to the model-settings YAML file. Contains AIC configuration parameters used by QAic execution provider of ONNX Runtime. The configuration for best performance and accuracy can be generated using model configurator tool.

aic_device_id

int

[Optional] AIC device ID, auto-picked when not configured

Parameters supported in model settings yaml file

Option

Description

Default

Relevance

Runtime parameters

aic-binary-dir

Absolute path or relative path ( wrt model settings file parent directory) to dir with programqpc.bin

“”

Required to skip compilation.

device-id

AIC device ID

0

Optional

set-size

Set Size for inference loop execution

10

Optional

aic-num-of-activations

Number of activations

1

Optional

qaicRegisterCustomOp - Compiler C API

register-custom-op

Register custom op using this configuration file

Required if model has AIC custom ops; vector of string

Graph Config - Compiler API

aic-depth-first-mem

Sets DFS memory size

Set by compiler

Optional. Used in compilation with aic-enable-depth-first

aic-enable-depth-first

Enables DFS with default memory size; “True”, “False”

Set by compiler

Optional. Used in compilation.

aic-num-cores

Number of aic cores to be used for inference on

1

Optional. Used in compilation.

allocator-dealloc-delay

Option to increase buffer lifetime 0 - 10, e.g 1

Set by compiler

Optional. Used in compilation.

batchsize

Sets the number of batches to be used for execution

1

Optional. Used in compilation.

convert-to-fp16

Run all floating-point in fp16; “True”, “False”

“False”

Optional. Used in compilation.

enable-channelwise

Enable channelwise quantization of Convolution op; “True”, “False”

Set by compiler

Optional. Used in compilation with pgq-profile.

enable-rowwise

Enable rowwise quantization of FullyConnected and SparseLengthsSum ops; “True”, “False”

Set by compiler

Optional. Used in compilation with pgq-profile.

execute-nodes-in-fp16

Run all insances of the operators in this list with FP16; “True”, “False”

Set by compiler

Optional. Used in compilation with pgq-profile for mixed precision.

hwVersion

HW version of AI

QAIC_HW_V2_0

Cannot be configured, set to QAIC_HW_V2_0.

keep-original-precision-for-nodes

Run operators in this list with original precision at generation

Optional. Used in compilation with pgq-profile for mixed precision.

mos

Effort level to reduce the on-chip memory; eg: “1”

Set by compiler

Optional. Used in compilation.

multicast-weights

Reduce DDR bandwidth by loading weights used on multiple-cores only once and multicasting to other cores

ols

Factor to increasing splitting of network for parallelism

Set by compiler

Optional. Used in compilation.

quantization-calibration

Specify quantization calibration -“None”, “KLMinimization”, “Percentile”, “MSE”, “SQNR”, “KLMinimizationV2”

“None”

Optional. Used in compilation with pgq-profile.

quantization-schema-activations

Specify quantization schema - “asymmetric”, “symmetric”, “symmetric_with_uint8”, “symmetric_with_power2_scale”

“symmetric_with_uint8”

Optional. Used in compilation with pgq-profile.

quantization-schema-constants

Specify quantization schema - “asymmetric”, “symmetric”, “symmetric_with_uint8”, “symmetric_with_power2_scale”

“symmetric_with_uint8”

Optional. Used in compilation with pgq-profile.

size-split-granularity

To set max tile size, KiB between 512 - 2048, e.g 1024

Set by compiler

Optional. Used in compilation.

aic-hw

To set the target to QAIC_SIM or QAIC_HW; “True”, “False”

“True”

Optional.

Model Params - Compiler API

model-path

Path to model file

Required. Used in compilation, OnnxRT framework.

onnx-define-symbol

Define an onnx symbol with its value. pairs of onnx symbol key,value separated by space.

Required. Used in compilation, OnnxRT framework.

external-quantization

Path to load the externally generated quantization profile

Optional

node-precision-info

Path to load model loader precision file for setting node instances to FP16 or FP32

Optional. Used in compilation with pgq-profile for mixed precision.

Common

relative-path

aic-binary-dir absolute path will be constructed using base-path of model-settings file; “True”, “False”

“False”

Optional. Set to true, to allow relative-path for aic-binary-dir.

Usage

Python

Here are few basic commands you can use with ONNX Runtime and QAIC.

Load a model

import onnxruntime as ort
provider_options = []
qaic_provider_options = {}
qaic_provider_options['config'] = '/path/to/yaml/file'
qaic_provider_options['device_id'] = aic_device_id
provider_options.append(qaic_provider_options)
session=onnxruntime.InferenceSession('/path/to/onnx/model', sess_options,
                                           providers = providers, provider_options = providers_options)

This will bind your model to AIC100 chip, with qaic exectuion provider.

Perform Inference

# Perform inference using OnnxRuntime
results = sess.run(None, {'input_name': input_data})

In the above code replace 'input_name' with name of model input node and input_data with the actual input data.

C++

Load a Model

#include <onnxruntime_cxx_api.h>
#include <qaic_provider_factory.h>


// Set environment as required
Ort::Env env(ORT_LOGGING_LEVEL_ERROR, "test");
// Initialize session options, create session
Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(1);
session_options.SetGraphOptimizationLevel(
    GraphOptimizationLevel::0);
auto s = OrtSessionOptionsAppendExecutionProvider_QAic(
        session_options, "/path/to/yaml/file", aic_device_id);

Ort::Session session(env, "/path/to/onnx/model", session_options);

Perform inference

// Run the model

auto output_tensors = session.Run(Ort::RunOptions{nullptr},
input_names.data(), &input_tensor, 1, output_names.data(), 1);

In the above code, replace "/path/to/onnx/model/" to the path for your onnx file. Also ensure data and shape of your input tensor match the requirements of your model.

End-to-end examples

Install additional packages in the container for running End-to-end examples

apt-get update
apt-get install -y python-yaml libpng-dev

pip3 install --upgrade pip
pip3 install opencv-python pyyaml scipy

End to end examples (cpp and python) for resnet50 are available at - /opt/qti-aic/integrations/qaic_onnxrt/tests/.

Running the ResNet C++ sample

Compile the Sample Resnet C++ test using build_tests.sh script. By default, test is built using libs from onnxruntime_qaic release build. To enable debugging, re-build onnxruntime_qaic project in Debug configuration and run ./build_test.sh with debug flag.

build_tests.sh [--release|--debug]

Run the executable. The commands below set the environment and run the ResNet-50 model with the provided image on QAic or CPU backend. The program outputs the most probable prediction class index for each iteration.

cd build/release
./qaic-onnxrt-resnet50-test -i <path/to/input/png/image>
                            -m  ../../resnet50/resnet50.yaml

Test options

Option

Description

-m, --model-config

[Required] Path to the model-setting yaml file

-i, --input-path

[Required] Path to the input PNG image file

-b, --backend

[Optional] Default=’qaic’, Specify qaic/cpu as backend

-d, --device-id

[Optional] Default=0 Specify qaic device ID

-n, --num-iter

[Optional]

Running the ResNet Python sample

Run test_resnet.py at /opt/qti-aic/integrations/qaic_onnxrt/tests/resnet50
python test_resnet.py --model_config ./resnet50/resnet50.yaml
                      --input_file </path/to/png/image>

Test options

Option

Description

--model_config

[Required] Path to the model-setting yaml file

--input_file

[Required] Path to the input PNG image file

--backend

[Optional] Default=’qaic’, Specify qaic/cpu as backend

--device_id

[Optional] Default=0 Specify qaic device ID

--num_iter

[Optional]

Running models with generic QAic EP test

test_qaic_ep.py is a generic test runner for compilation, execution on AIC100.

Run test_qaic_ep.py at /opt/qti-aic/integrations/qaic_onnxrt/tests/ -

python test_qaic_ep.py --model_config ./resnet50/resnet50.yaml
                           --input_file_list </path/to/input/list>

Test options

Option

Description

--model_config

[Required] Path to the model-setting yaml file

--input_file_list

[Required] Path of the file (.txt) containing list of batched inputs in .raw format

--backend

[Optional] Default=’qaic’, Specify qaic/cpu as backend

--device_id

[Optional] Default=0 Specify qaic device ID

--num_iter

[Optional]

--max_threads

[Optional] Default=1000 Maximum no. of threads to run inferences

--log_level

[Optional]

Execution through Onnxruntime test framework

cd /opt/qti-aic/integrations/qaic_onnxrt/onnxruntime_qaic/build/Release

./onnxruntime_perf_test -e qaic -i 'config|/path/to/resnet50.yaml aic_device_id|0' -m times -r 1000 /path/to/model.onnx
./onnx_test_runner -e qaic /path/to/model/dir