QAIRT Tools¶
Accuracy Evaluator¶
Refer to qairt-accuracy-evaluator section in qairt-accuracy-evaluator-(beta)
This tool can be used only for Non-LLMs for Cloud AI backend.
Example on Cloud AI Backend¶
Example of running EfficientNet_b0 model with fp16 precision with Accuracy Evaluator
Install packages mentioned in additional-packages-for-evaluating-model-accuracy
Run qairt-accuracy-evaluator for model EfficientNet_b0 model
With config efficientNet_b0_config.yaml
Cleanup intermediate files
On device id 5
Run for inference with schema tag “qnn_fp16”
qairt-accuracy-evaluator -config efficientNet_b0_config.yaml -cleanup intermediate -inference_schema_tag qnn_fp16 -work_dir WORKING_DIR_PATH -silent -device_id 5
efficientNet_b0_config.yaml
efficientNet_b0_config.yaml
model: info: desc: EfficientNet-b0 reference public model used. batchsize: 1 globals: count: -1 calib: -1 npi_file: efficientNet_b0_config.json dataset: name: ILSVRC2012 path: '/home/ml-datasets/imageNet/' inputlist_file: inputlist.txt annotation_file: ground_truth.txt calibration: type: dataset file: calibration.txt transformations: - plugin: name: filter_dataset params: random: False max_inputs: $count max_calib: $calib preprocessing: transformations: - plugin: name: resize params: library: torchvision dims: 224, 224 interp: bicubic type: imagenet typecasting_required: False - plugin: name: crop params: library: torchvision dims: 224,224 typecasting_required: False - plugin: name: normalize params: library: torchvision normalize_first: True means: R: 0.485 G: 0.456 B: 0.406 std: R: 0.229 G: 0.224 B: 0.225 - plugin: name: create_batch inference-engine: model_path: public/efficientnet/efficientnet-b0.onnx inference_schemas: - inference_schema: name: qnn precision: fp32 target_arch: x86_64-linux-clang backend: cpu tag: qnn_fp32,ci - inference_schema: name: qnn precision: fp16 target_arch: x86_64-linux-clang backend: aic tag: qnn_fp16,ci backend_extensions: compiler_perfWarnings: True netrun_params: use_native_input_data: True use_native_output_data: True converter_params: float_bitwidth: 16 float_bias_bitwidth: 32 preserve_io_datatype: True - inference_schema: name: qnn precision: quant target_arch: x86_64-linux-clang backend: aic tag: qnn_int8_mp,ci converter_params: float_bias_bitwidth: 32 quantization_overrides: $npi_file backend_extensions: compiler_perfWarnings: true compiler_num_of_cores: 1 compiler_overlap_split_factor: 1 compiler_perfWarnings: True runtime_num_activations: 14 runtime_threads_per_queue: 4 quantizer_params: bias_bitwidth: 32 float_bias_bitwidth: 32 act_quantizer_calibration: percentile act_quantizer_schema: asymmetric param_quantizer_schema: symmetric percentile_calibration_value: 99.998 use_per_channel_quantization: True preserve_io_datatype: True netrun_params: use_native_input_data: True use_native_output_data: True inputs_info: - input.1: type: float32 shape: ["*", 3, 299, 299] outputs_info: - _666: type: float32 shape: ["*", 1000] metrics: transformations: - plugin: name: topk params: kval: 1,5 round: 7
efficientNet_b0_config.yaml is provided to retain few nodes in higher precision to maintain accuracy
efficientNet_b0_config.json
{ "activation_encodings": { "_380": [ { "bitwidth": 16, "dtype": "float" } ], "_381": [ { "bitwidth": 16, "dtype": "float" } ], "_417": [ { "bitwidth": 16, "dtype": "float" } ], "_418": [ { "bitwidth": 16, "dtype": "float" } ], "_398": [ { "bitwidth": 16, "dtype": "float" } ], "_399": [ { "bitwidth": 16, "dtype": "float" } ], "_367": [ { "bitwidth": 16, "dtype": "float" } ], "_368": [ { "bitwidth": 16, "dtype": "float" } ], "_366": [ { "bitwidth": 16, "dtype": "float" } ], "_364": [ { "bitwidth": 16, "dtype": "float" } ], "_365": [ { "bitwidth": 16, "dtype": "float" } ], "_384": [ { "bitwidth": 16, "dtype": "float" } ], "_382": [ { "bitwidth": 16, "dtype": "float" } ], "_383": [ { "bitwidth": 16, "dtype": "float" } ], "_362": [ { "bitwidth": 16, "dtype": "float" } ], "_363": [ { "bitwidth": 16, "dtype": "float" } ], "_387": [ { "bitwidth": 16, "dtype": "float" } ], "_388": [ { "bitwidth": 16, "dtype": "float" } ], "_385": [ { "bitwidth": 16, "dtype": "float" } ], "_386": [ { "bitwidth": 16, "dtype": "float" } ], "_454": [ { "bitwidth": 16, "dtype": "float" } ], "_455": [ { "bitwidth": 16, "dtype": "float" } ] }, "param_encodings": {} }
Accuracy Debugger¶
Refer to qairt-accuracy-debugger section in qairt-accuracy-debugger-(beta)
It is supported only Non-LLMs for Cloud AI backend
Example on Cloud AI Backend¶
Example of running EfficientNet_b0 model with fp16 precision with Accuracy Evaluator
EfficientNetB0 Framework_Runner stage to get intermediate ouputs with onnx
Model efficientnet-b0.onnx
For input tensor “‘input.1’ 1,3,224,224 input.raw float32”
Output tensor “666”
qairt-accuracy-debugger --framework_runner --framework onnx --model_path efficientnet-b0.onnx --working_dir RUNNER_WORKING_DIR --output_dirname fw_runner --disable_graph_optimization --verbose --input_tensor 'input.1' 1,3,224,224 input.raw float32 --output_tensor '666'
EfficientNetB0 Inference_Engine stage for INT8 precision to get the intermedicate output with QAIRT engine with Cloud AI as backend
For runtime “aic” backend(Cloud AI)
Host type x86
For input tensor “‘input.1’ 1,3,224,224 input.raw float32”
Output tensor “666”
Symmetric quantization to be used parameter and activations
qairt-accuracy-debugger --inference_engine --model_path efficientnet-b0.onnx --runtime aic --architecture x86_64-linux-clang --input_list qnn_efficientNet_b0_list.txt --calibration_input_list qnn_efficientNet_b0_list.txt --working_dir INF_WORKING_DIR --output_dirname InferenceResults --executor_type QAIRT --engine_path SDK_PATH --verbose --host_device x86 --profiling_level basic --log_level error --debug_mode_off --bias_bitwidth 32 --param_quantizer_schema symmetric --act_quantizer_schema symmetric --param_quantizer_calibration min-max --use_per_channel_quantization --input_tensor 'input.1' 1,3,224,224 input.raw float32 --output_tensor '666' "
Input list qnn_efficientNet_b0_list.txt
input:=/PATH_TO_MODEL_INPUT/model-inputs/inputs/224x224/batch_size_1/./input.raw
EfficientNetB0 Verification stage for Int8 precision to compare outputs generated by framework runner (onnx) and Inference Engine(cloud AI backend) using CosineSimilarity
Use verifier as CosineSimilarity
Outputs will be available at WORKING_DIR
qairt-accuracy-debugger --verification --default_verifier CosineSimilarity --working_dir WORKING_DIR --verbose --golden_output_reference_directory RUNNER_WORKING_DIR --inference_results INF_WORKING_DIR/InferenceResults/output/Result_0/'
Hyper-Tuner¶
qnn-hypertuner (Experimental)¶
qnn-hypertuner, also referred to as Hypertuner, is a performance tuning tool that provides an optimal combination of compiler parameters. The Hypertuner takes, as input, a JSON configuration file containing the name of the deep learning model, hyper parameters, search algorithm, and backend. It then performs a search over the hyperspace as defined by the input parameters and outputs an optimal parameter set for use by downstream tasks or applications.
qnn-hypertuner is an experimental tool which is currently supported only on limited set of mobile devices and auto platforms.
Setup¶
Hypertuner usage assumes general setup instructions have been followed.
Usage¶
Hypertuner can be used in the following modes.
Search mode – Tunes deep learning model for performance
Plot mode – Visualizes statistics generated in Search mode
Run qnn-hypertuner --help
to see the command line help message.
Search mode¶
Hypertuner can be run in Search mode by entering the search
option in the command line.
qnn-hypertuner search -h
qnn-hypertuner search -i INPUT [-l] [-bo] [-an] [-ao] [-cp0] [-h]
Required arguments¶
- -i, --input INPUT
Configuration file in JSON format. The configuration file must contain the following.
Deep learning model
Hyper parameters
Search Algorithm
Backend
Refer to the sample configuration files included in the SDK for details.
Optional arguments¶
Optional arguments can be used to provide additional information or to override the configurations provided by the input configuration file. Some of the options are shown below. Please run help command to view latest options.
- -an, –algorithm-name
Algorithm name; options:
dopt
evol
brute
- -ao, –algorithm-obj
Algorithm objective; options:
max
min
- -l
Logging levels; options:
DBG
INFO
WARN
ERR
- -cp0
Disable check-pointing
Optional Backend arguments¶
These options are displayed when –input flag is provided along with –help. They are backend specific options which are displayed after extracting backend name from –input config.
QNN-AIC
qnn-aic backend's flag:
-bo {ips,latency,avg_ddr_bw}, --backend-obj {ips,latency,avg_ddr_bw}
QNN-HTP
qnn-htp backend's flag:
-bo {exe_time,ddr_bandwidth}, --backend-obj {exe_time,ddr_bandwidth}
QNN-HTPMCP
qnn-htpmcp backend's flag:
-bo {exe_time,ddr_bandwidth}, --backend-obj {exe_time,ddr_bandwidth}
Hextimate
hextimate backend's flag:
-bo {exec_cycles,exec_cycles_lower,exec_cycles_upper,ddr_bandwidth}, --backend-obj {exec_cycles,exec_cycles_lower,exec_cycles_upper,ddr_bandwidth}
Sample commands¶
Please follow the Tutorials for downloading Inception V3 Tensorflow model file and the sample images before executing the sample commands.
AIC
qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_aic.json
HTP (Mobile)
qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htp.json
HTP (QDrive)
qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htp_qd.json
HTPMCP (QDrive)
qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htpmcp_qd.json
Hextimate
qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_hextimate.json
Sample configurations¶
Configuration files have the following sections.
Hyperparameters – Specifies the list of parameters and the range for the search
Initial points – Specifies initial points used to start the search
- Algorithm – Specifies the search algorithm; currently supported algorithms are:
DOpt (Discrete Optimization)
Evol (Evolutionary Tuning)
Brute (Brute Force)
- Backend – Specifies the backend and the name of the deep learning model; currently supported backends are:
QNN-AIC
QNN-HTP
QNN-HTPMCP
QNN-HEXTIMATE
Refer to the sample configuration files included in the SDK for details.
Output¶
Hypertuner generates the following outputs.
Set of optimal parameters with their respective values
Checkpoint file used to resume an earlier run
Hypertuner logs (history.log)
Plot mode¶
Hypertuner can be used for visualization by entering the plot
option in the command line.
qnn-hypertuner plot -h
qnn-hypertuner plot -i input_file
The history.log file generated by running Hypertuner in Search mode can be used as the input for Plot mode.
qnn-hypertuner plot -i history.log
To generate a host profile, set the HT_ENABLE_PROFILING
environment variable when running qnn-hypertuner in Search mode as shown below.
HT_ENABLE_PROFILING=1 qnn-hypertuner search -i example_qnn_aic.json
qnn-hypertuner plot -i history.log
Different views can be generated by selecting different values from the drop-down list
Constraints¶
Result will converge to Local Maxima/Minima
Run to run variations are expected on hardware backends
DOpt algorithm is not suitable for categorical hyperparameters