QAic runner¶

QAic runner (qaic-runner) is a command-line runtime tool for executing precompiled network binaries on Cloud AI devices. The tool runs model binaries generated by the qaic-compile as well as supplied with the Apps SDK (for example, those located in /opt/qti-aic/test-data).

Examples:¶

Test data for precompiled workloads:

/opt/qti-aic/test-data/aic100/v2

Run the precompiled workload:

sudo /opt/qti-aic/exec/qaic-runner -t /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/ \
     --aic-batch-json-input /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/ios.json \
     --write-output-start-iter 0 \
     --write-output-num-samples 1 \
     --write-output-dir ./outputs \
     -a 3 -n 5000 -d 0 -v

Refer to CV and LLM workflow for end-to-end examples demonstrating qaic-compile and qaic-runner using the following reference models:

CV: ResNet50-v1-7
LLM: Llama-3.2-1B-Instruct

qaic-runner argument details:¶

The options below are based on qaic-runner --help.

Argument	Description	Default
`-d, --dev <qid>`	Device ID (QID). If not provided, the device is auto-picked.	Auto-pick
`-D, --dev-list <qid-list>`	List of device IDs for a multi-device network. Default is all QIDs. Examples for a 3-card network: - `-D 3:1:2` assigns QIDs 3,1,2 to partitions in that order. - `-D 3,4,5,7,8,9` auto-picker will pick 3 QIDs from this list. - `-D 3..5,7..9` auto-picker will pick 3 QIDs from {3,4,5,7,8,9}. - `-D 3,4,5,7..9` auto-picker will pick 3 QIDs from {3,4,5,7,8,9}. - `-D 3..5,7,8,9` auto-picker will pick 3 QIDs from {3,4,5,7,8,9}.	all QIDs
`-t, --test-data <path>`	Test directory where to look for the network. Looks for `bindings.json` for input data.	current directory
`-n, --num-iter <i>`	Number of total inferences to run.	100
`--time <t>`	Duration (in seconds) for which to submit inferences.
`-l, --live-reporting`	Live reporting of results using reporting-period intervals.	off
`-r, --live-reporting-period <i>`	Period to report inferences per second.	1000 ms
`-s, --stats`	Enable detailed live reporting of host stats (completed inferences, enqueue/submit input, pre/post-processing latency, and more).
`-a, --aic-num-activations <i>`	Number of activations.	1
`--aic-profiling-type <type>`	Profiling type: `stats` \| `trace` \| `latency` \| `raw_device_stats`. Set multiple times for multiple types.	none
`--aic-profiling-start-iter <num>`	Profiling start iteration (OpStats). Only applicable for legacy profiling.	0
`--aic-profiling-num-samples <num>`	Number of profiling samples to save to file. Only applicable for legacy profiling.	1
`--aic-profiling-out-dir <path>`	Base directory for profiling files.	`.`
`--aic-profiling-start-delay <num>`	Profiling start delay (ms). Profiling will start after the delay period has elapsed.
`--write-output-start-iter <num>`	Write outputs start iteration.	0
`--write-output-num-samples <num>`	Number of outputs to write.	1
`--write-output-dir <path>`	Location to save output files (directory must exist and be writable).	`.`
`--aic-batch-json-input`	Batch mode: specify input files in JSON format. See --aic-batch-json-input JSON Format for the full JSON format reference.
`--aic-batch-max-memory <mb>`	Batch mode: limit memory usage when loading files (MB).	1024
`--datapath-timeout <num>`	Time to wait for an inference request completion on kernel (ms). When 0, the kernel defaults to 5000 ms.	7200000 ms
`-S, --set-size <i>`	Set size.	10
`-T, --threads-per-queue <i>`	Threads per queue.	4
`--auto-batch-input`	Automatically batch inputs to meet batch-size requirements of the network. Inputs should be for batch size 1.
`-q, --query`	Query network info (prints network details).	off
`-c, --check-output`	Check output validation buffers based on test-dir and JSON output names.	off
`-p, --pre-post-processing`	Pre-/post-processing (on``\|``off) or entry point (default``\|``quantize``\|``transpose``\|``d32convert``\|``convert``\|``dma).	on
`--frequency-limit`	Limit the submission frequency to `frequency-limit` in hertz. 0 means no limit.	0
`-x, --lock`	Lock device if `QAIC_SERIALIZE_DEVICE` is set. This is an advisory lock, so to use this all instances of the application need to run with the env variable set.
`--bound-random`	Use randomly generated inputs that are bounded by buffer format.
`--unbound-random`	Use randomly generated inputs that are unbounded (fill each byte with a random value 0-255). This can result in unexpected behavior from certain networks.
`--dump-input-buffers`	Dump input buffers used in benchmarking mode.
`-u, --collect-device-log <level>`	Collect AIC device log. Use `<level>` to configure ULOG verbose level of device. Valid values: `error` \| `warn` \| `info` \| `debug` \| `default`. If `default` is used, verbose level of the device is not updated. If multiple processes use the same device id, the most recent log level update is applied to the device (for example, running one instance with `-u error` and another with `-u info` updates device logging to `info` for all).
`-v, --verbose`	Verbosity. Each `-v` increments the logging level. default: off (warn) `-vv`: info	off
`-h, --help`	Help