QAIC execution provider¶
QAIC Execution Provider for ONNX Runtime enables hardware accelerated execution on Qualcomm AIC100 chipset. It leverages AIC compiler and runtime API packaged in apps, platform SDKs.
Setup¶
- Create a json specification file for onnxruntime as shown below
- Launch a docker container for an image built with this specification, following instructions in Docker
- In docker container, onnxruntime_qaic build (onnxruntime version 1.13.1 with qaic EP integration) will be available at /opt/qti-aic/integrations/qaic_onnxrt/onnxruntime_qaic.
Configuration Options¶
option | type | description |
---|---|---|
config | str | [Required] Path to the model-settings YAML file. Contains AIC configuration parameters used by QAic execution provider of ONNX Runtime. The configuration for best performance and accuracy can be generated using model configurator tool. |
aic_device_id | int | [Optional] AIC device ID, auto-picked when not configured |
Parameters supported in model settings yaml file¶
Option | Description | Default | Relevance |
---|---|---|---|
Runtime parameters | |||
aic-binary-dir | Absolute path or relative path ( wrt model settings file parent directory) to dir with programqpc.bin | "" | Required to skip compilation. |
device-id | AIC device ID | 0 | Optional |
set-size | Set Size for inference loop execution | 10 | Optional |
aic-num-of-activations | Number of activations | 1 | Optional |
qaicRegisterCustomOp - Compiler C API | |||
register-custom-op | Register custom op using this configuration file | Required if model has AIC custom ops; vector of string | |
Graph Config - Compiler API | |||
aic-depth-first-mem | Sets DFS memory size | Set by compiler | Optional. Used in compilation with aic-enable-depth-first |
aic-enable-depth-first | Enables DFS with default memory size; "True", "False" | Set by compiler | Optional. Used in compilation. |
aic-num-cores | Number of aic cores to be used for inference on | 1 | Optional. Used in compilation. |
allocator-dealloc-delay | Option to increase buffer lifetime 0 - 10, e.g 1 | Set by compiler | Optional. Used in compilation. |
batchsize | Sets the number of batches to be used for execution | 1 | Optional. Used in compilation. |
convert-to-fp16 | Run all floating-point in fp16; "True", "False" | "False" | Optional. Used in compilation. |
enable-channelwise | Enable channelwise quantization of Convolution op; "True", "False" | Set by compiler | Optional. Used in compilation with pgq-profile. |
enable-rowwise | Enable rowwise quantization of FullyConnected and SparseLengthsSum ops; "True", "False" | Set by compiler | Optional. Used in compilation with pgq-profile. |
execute-nodes-in-fp16 | Run all insances of the operators in this list with FP16; "True", "False" | Set by compiler | Optional. Used in compilation with pgq-profile for mixed precision. |
hwVersion | HW version of AI | QAIC_HW_V2_0 | Cannot be configured, set to QAIC_HW_V2_0. |
keep-original-precision-for-nodes | Run operators in this list with original precision at generation | Optional. Used in compilation with pgq-profile for mixed precision. | |
mos | Effort level to reduce the on-chip memory; eg: "1" | Set by compiler | Optional. Used in compilation. |
multicast-weights | Reduce DDR bandwidth by loading weights used on multiple-cores only once and multicasting to other cores | ||
ols | Factor to increasing splitting of network for parallelism | Set by compiler | Optional. Used in compilation. |
quantization-calibration | Specify quantization calibration -"None", "KLMinimization", "Percentile", "MSE", "SQNR", "KLMinimizationV2" | "None" | Optional. Used in compilation with pgq-profile. |
quantization-schema-activations | Specify quantization schema - "asymmetric", "symmetric", "symmetric_with_uint8", "symmetric_with_power2_scale" | "symmetric_with_uint8" | Optional. Used in compilation with pgq-profile. |
quantization-schema-constants | Specify quantization schema -"asymmetric", "symmetric", "symmetric_with_uint8", "symmetric_with_power2_scale" | "symmetric_with_uint8" | Optional. Used in compilation with pgq-profile. |
size-split-granularity | To set max tile size, KiB between 512 - 2048, e.g 1024 | Set by compiler | Optional. Used in compilation. |
aic-hw | To set the target to QAIC_SIM or QAIC_HW; "True", "False" | "True" | Optional. |
Model Params - Compiler API | |||
model-path | Path to model file | Required. Used in compilation, OnnxRT framework. | |
onnx-define-symbol | Define an onnx symbol with its value. pairs of onnx symbol key,value separated by space. | Required. Used in compilation, OnnxRT framework. | |
external-quantization | Path to load the externally generated quantization profile | Optional | |
node-precision-info | Path to load model loader precision file for setting node instances to FP16 or FP32 | Optional. Used in compilation with pgq-profile for mixed precision. | |
Common | |||
relative-path | aic-binary-dir absolute path will be constructed using base-path of model-settings file; "True", "False" | "False" | Optional. Set to true, to allow relative-path for aic-binary-dir. |
Usage¶
Python¶
Here are few basic commands you can use with ONNX Runtime and QAIC.
Load a model¶
import onnxruntime as ort
provider_options = []
qaic_provider_options = {}
qaic_provider_options['config'] = '/path/to/yaml/file'
qaic_provider_options['device_id'] = aic_device_id
provider_options.append(qaic_provider_options)
session=onnxruntime.InferenceSession('/path/to/onnx/model', sess_options,
providers = providers, provider_options = providers_options)
This will bind your model to AIC100 chip, with qaic exectuion provider.
Perform Inference¶
In the above code replace 'input_name'
with name of model input node and input_data with the actual input data.
C++¶
Load a Model¶
#include <onnxruntime_cxx_api.h>
#include <qaic_provider_factory.h>
// Set environment as required
Ort::Env env(ORT_LOGGING_LEVEL_ERROR, "test");
// Initialize session options, create session
Ort::SessionOptions session_options;
session_options.SetIntraOpNumThreads(1);
session_options.SetGraphOptimizationLevel(
GraphOptimizationLevel::0);
auto s = OrtSessionOptionsAppendExecutionProvider_QAic(
session_options, "/path/to/yaml/file", aic_device_id);
Ort::Session session(env, "/path/to/onnx/model", session_options);
Perform inference¶
// Run the model
auto output_tensors = session.Run(Ort::RunOptions{nullptr},
input_names.data(), &input_tensor, 1, output_names.data(), 1);
In the above code, replace "/path/to/onnx/model/"
to the path for your onnx file. Also ensure data and shape of your input tensor match the requirements of your model.
End-to-end examples¶
Install additional packages in the container for running End-to-end examples
apt-get update
apt-get install -y python-yaml libpng-dev
pip3 install --upgrade pip
pip3 install opencv-python pyyaml scipy
/opt/qti-aic/integrations/qaic_onnxrt/tests/
.
Running the ResNet C++ sample¶
Compile the Sample Resnet C++ test using build_tests.sh script. By default, test is built using libs from onnxruntime_qaic release build. To enable debugging, re-build onnxruntime_qaic project in Debug configuration and run ./build_test.sh with debug flag.
Run the executable. The commands below set the environment and run the ResNet-50 model with the provided image on QAic or CPU backend. The program outputs the most probable prediction class index for each iteration.
cd build/release
./qaic-onnxrt-resnet50-test -i <path/to/input/png/image>
-m ../../resnet50/resnet50.yaml
Test options
Option | Description |
---|---|
-m, --model-config | [Required] Path to the model-setting yaml file |
-i, --input-path | [Required] Path to the input PNG image file |
-b, --backend | [Optional] Default='qaic', Specify qaic/cpu as backend |
-d, --device-id | [Optional] Default=0 Specify qaic device ID |
-n, --num-iter | [Optional] |
Running the ResNet Python sample¶
Run test_resnet.py at /opt/qti-aic/integrations/qaic_onnxrt/tests/resnet50
python test_resnet.py --model_config ./resnet50/resnet50.yaml
--input_file </path/to/png/image>
Test options
Option | Description |
---|---|
--model_config | [Required] Path to the model-setting yaml file |
--input_file | [Required] Path to the input PNG image file |
--backend | [Optional] Default='qaic', Specify qaic/cpu as backend |
--device_id | [Optional] Default=0 Specify qaic device ID |
--num_iter | [Optional] |
Running models with generic QAic EP test¶
test_qaic_ep.py
is a generic test runner for compilation, execution on AIC100.
Run test_qaic_ep.py
at /opt/qti-aic/integrations/qaic_onnxrt/tests/
-
python test_qaic_ep.py --model_config ./resnet50/resnet50.yaml
--input_file_list </path/to/input/list>
Test options
Option | Description |
---|---|
--model_config | [Required] Path to the model-setting yaml file |
--input_file_list | [Required] Path of the file (.txt) containing list of batched inputs in .raw format |
--backend | [Optional] Default='qaic', Specify qaic/cpu as backend |
--device_id | [Optional] Default=0 Specify qaic device ID |
--num_iter | [Optional] |
--max_threads | [Optional] Default=1000 Maximum no. of threads to run inferences |
--log_level | [Optional] |
Execution through Onnxruntime test framework¶
-
QAic EP is enabled for execution with onnx_test_runner, onnxruntime_perf_test.
-
For model directory requirements and comprehensive list of options supported, refer to onnxruntime perf test documentation.
-
Sample testdata can be downloaded here.