QAic runner¶
QAic runner (qaic-runner) is a command-line runtime tool for executing precompiled network binaries on Cloud AI devices. The tool runs model binaries generated by the qaic-compile as well as supplied with the Apps SDK (for example, those located in /opt/qti-aic/test-data).
Examples:¶
Test data for precompiled workloads:
/opt/qti-aic/test-data/aic100/v2
Run the precompiled workload:
sudo /opt/qti-aic/exec/qaic-runner -t /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/ \
--aic-batch-json-input /opt/qti-aic/test-data/aic100/v2/4nsp/4nsp-quant-resnet50/ios.json \
--write-output-start-iter 0 \
--write-output-num-samples 1 \
--write-output-dir ./outputs \
-a 3 -n 5000 -d 0 -v
Refer to CV and LLM workflow for end-to-end examples demonstrating qaic-compile and qaic-runner using the following reference models:
CV: ResNet50-v1-7
LLM: Llama-3.2-1B-Instruct
qaic-runner argument details:¶
The options below are based on qaic-runner --help.
Argument |
Description |
Default |
|---|---|---|
|
Device ID (QID). If not provided, the device is auto-picked. |
Auto-pick |
|
List of device IDs for a multi-device network. Default is all QIDs.
Examples for a 3-card network:
- |
all QIDs |
|
Test directory where to look for the network. Looks for |
current directory |
|
Number of total inferences to run. |
100 |
|
Duration (in seconds) for which to submit inferences. |
|
|
Live reporting of results using reporting-period intervals. |
off |
|
Period to report inferences per second. |
1000 ms |
|
Enable detailed live reporting of host stats (completed inferences, enqueue/submit input, pre/post-processing latency, and more). |
|
|
Number of activations. |
1 |
|
Profiling type: |
none |
|
Profiling start iteration (OpStats). Only applicable for legacy profiling. |
0 |
|
Number of profiling samples to save to file. Only applicable for legacy profiling. |
1 |
|
Base directory for profiling files. |
|
|
Profiling start delay (ms). Profiling will start after the delay period has elapsed. |
|
|
Write outputs start iteration. |
0 |
|
Number of outputs to write. |
1 |
|
Location to save output files (directory must exist and be writable). |
|
|
Batch mode: specify input files in JSON format. See --aic-batch-json-input JSON Format for the full JSON format reference. |
|
|
Batch mode: limit memory usage when loading files (MB). |
1024 |
|
Time to wait for an inference request completion on kernel (ms). When 0, the kernel defaults to 5000 ms. |
7200000 ms |
|
Set size. |
10 |
|
Threads per queue. |
4 |
|
Automatically batch inputs to meet batch-size requirements of the network. Inputs should be for batch size 1. |
|
|
Query network info (prints network details). |
off |
|
Check output validation buffers based on test-dir and JSON output names. |
off |
|
Pre-/post-processing ( |
on |
|
Limit the submission frequency to |
0 |
|
Lock device if |
|
|
Use randomly generated inputs that are bounded by buffer format. |
|
|
Use randomly generated inputs that are unbounded (fill each byte with a random value 0-255). This can result in unexpected behavior from certain networks. |
|
|
Dump input buffers used in benchmarking mode. |
|
|
Collect AIC device log. Use If |
|
|
Verbosity. Each
|
off |
|
Help |