QAic runner

QAic runner (qaic-runner) is a command-line runtime tool for executing precompiled network binaries on Cloud AI devices. The tool runs model binaries generated by the qaic-compile as well as supplied with the Apps SDK (for example, those located in /opt/qti-aic/test-data).

Examples:

Test data for precompiled workloads:

/opt/qti-aic/test-data/aic100/v2

Run the precompiled workload:

sudo /opt/qti-aic/exec/qaic-runner -t /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/ \
     --aic-batch-json-input /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/ios.json \
     --write-output-start-iter 0 \
     --write-output-num-samples 1 \
     --write-output-dir ./outputs \
     -a 3 -n 5000 -d 0 -v

Refer to CV and LLM workflow for end-to-end examples demonstrating qaic-compile and qaic-runner using the following reference models:

  • CV: ResNet50-v1-7

  • LLM: Llama-3.2-1B-Instruct

qaic-runner argument details:

The options below are based on qaic-runner --help.

Argument

Description

Default

-d, --dev <qid>

Device ID (QID). If not provided, the device is auto-picked.

Auto-pick

-D, --dev-list <qid-list>

List of device IDs for a multi-device network. Default is all QIDs. Examples for a 3-card network: - -D 3:1:2 assigns QIDs 3,1,2 to partitions in that order. - -D 3,4,5,7,8,9 auto-picker will pick 3 QIDs from this list. - -D 3..5,7..9 auto-picker will pick 3 QIDs from {3,4,5,7,8,9}. - -D 3,4,5,7..9 auto-picker will pick 3 QIDs from {3,4,5,7,8,9}. - -D 3..5,7,8,9 auto-picker will pick 3 QIDs from {3,4,5,7,8,9}.

all QIDs

-t, --test-data <path>

Test directory where to look for the network. Looks for bindings.json for input data.

current directory

-n, --num-iter <i>

Number of total inferences to run.

100

--time <t>

Duration (in seconds) for which to submit inferences.

-l, --live-reporting

Live reporting of results using reporting-period intervals.

off

-r, --live-reporting-period <i>

Period to report inferences per second.

1000 ms

-s, --stats

Enable detailed live reporting of host stats (completed inferences, enqueue/submit input, pre/post-processing latency, and more).

-a, --aic-num-activations <i>

Number of activations.

1

--aic-profiling-type <type>

Profiling type: stats | trace | latency | raw_device_stats. Set multiple times for multiple types.

none

--aic-profiling-start-iter <num>

Profiling start iteration (OpStats). Only applicable for legacy profiling.

0

--aic-profiling-num-samples <num>

Number of profiling samples to save to file. Only applicable for legacy profiling.

1

--aic-profiling-out-dir <path>

Base directory for profiling files.

.

--aic-profiling-start-delay <num>

Profiling start delay (ms). Profiling will start after the delay period has elapsed.

--write-output-start-iter <num>

Write outputs start iteration.

0

--write-output-num-samples <num>

Number of outputs to write.

1

--write-output-dir <path>

Location to save output files (directory must exist and be writable).

.

--aic-batch-json-input

Batch mode: specify input files in JSON format. See --aic-batch-json-input JSON Format for the full JSON format reference.

--aic-batch-max-memory <mb>

Batch mode: limit memory usage when loading files (MB).

1024

--datapath-timeout <num>

Time to wait for an inference request completion on kernel (ms). When 0, the kernel defaults to 5000 ms.

7200000 ms

-S, --set-size <i>

Set size.

10

-T, --threads-per-queue <i>

Threads per queue.

4

--auto-batch-input

Automatically batch inputs to meet batch-size requirements of the network. Inputs should be for batch size 1.

-q, --query

Query network info (prints network details).

off

-c, --check-output

Check output validation buffers based on test-dir and JSON output names.

off

-p, --pre-post-processing

Pre-/post-processing (on``|``off) or entry point (default``|``quantize``|``transpose``|``d32convert``|``convert``|``dma).

on

--frequency-limit

Limit the submission frequency to frequency-limit in hertz. 0 means no limit.

0

-x, --lock

Lock device if QAIC_SERIALIZE_DEVICE is set. This is an advisory lock, so to use this all instances of the application need to run with the env variable set.

--bound-random

Use randomly generated inputs that are bounded by buffer format.

--unbound-random

Use randomly generated inputs that are unbounded (fill each byte with a random value 0-255). This can result in unexpected behavior from certain networks.

--dump-input-buffers

Dump input buffers used in benchmarking mode.

-u, --collect-device-log <level>

Collect AIC device log. Use <level> to configure ULOG verbose level of device. Valid values: error | warn | info | debug | default.

If default is used, verbose level of the device is not updated. If multiple processes use the same device id, the most recent log level update is applied to the device (for example, running one instance with -u error and another with -u info updates device logging to info for all).

-v, --verbose

Verbosity. Each -v increments the logging level.

  • default: off (warn)

  • -vv: info

off

-h, --help

Help