QAIRT Tools¶

Accuracy Evaluator¶

Refer to qairt-accuracy-evaluator section in qairt-accuracy-evaluator-(beta)

This tool can be used only for Non-LLMs for Cloud AI backend.

Example on Cloud AI Backend¶

Example of running EfficientNet_b0 model with fp16 precision with Accuracy Evaluator

Install packages mentioned in additional-packages-for-evaluating-model-accuracy
Run qairt-accuracy-evaluator for model EfficientNet_b0 model

With config efficientNet_b0_config.yaml
Cleanup intermediate files
On device id 5
Run for inference with schema tag “qnn_fp16”

qairt-accuracy-evaluator -config efficientNet_b0_config.yaml -cleanup intermediate -inference_schema_tag qnn_fp16 -work_dir WORKING_DIR_PATH -silent -device_id 5

efficientNet_b0_config.yaml

model:
 info:
     desc: EfficientNet-b0 reference public model used.
     batchsize: 1
 globals:
     count: -1
     calib: -1
     npi_file: efficientNet_b0_config.json
 dataset:
     name: ILSVRC2012
     path: '/home/ml-datasets/imageNet/'
     inputlist_file: inputlist.txt
     annotation_file: ground_truth.txt
     calibration:
         type: dataset
         file: calibration.txt
     transformations:
         - plugin:
             name: filter_dataset
             params:
                 random: False
                 max_inputs: $count
                 max_calib: $calib
 preprocessing:
     transformations:
         - plugin:
                 name: resize
                 params:
                     library: torchvision
                     dims: 224, 224
                     interp: bicubic
                     type: imagenet
                     typecasting_required: False
         - plugin:
                 name: crop
                 params:
                     library: torchvision
                     dims: 224,224
                     typecasting_required: False
         - plugin:
                 name: normalize
                 params:
                     library: torchvision
                     normalize_first: True
                     means:
                         R: 0.485
                         G: 0.456
                         B: 0.406
                     std:
                         R: 0.229
                         G: 0.224
                         B: 0.225
         - plugin:
                 name: create_batch
 inference-engine:
     model_path: public/efficientnet/efficientnet-b0.onnx
     inference_schemas:
         - inference_schema:
             name: qnn
             precision: fp32
             target_arch: x86_64-linux-clang
             backend: cpu
             tag: qnn_fp32,ci
         - inference_schema:
             name: qnn
             precision: fp16
             target_arch: x86_64-linux-clang
             backend: aic
             tag: qnn_fp16,ci
             backend_extensions:
                compiler_perfWarnings: True
             netrun_params:
                use_native_input_data: True
                use_native_output_data: True
             converter_params:
                float_bitwidth: 16
                float_bias_bitwidth: 32
                preserve_io_datatype: True
         - inference_schema:
             name: qnn
             precision: quant
             target_arch: x86_64-linux-clang
             backend: aic
             tag: qnn_int8_mp,ci
             converter_params:
                float_bias_bitwidth: 32
                quantization_overrides: $npi_file
             backend_extensions:
                compiler_perfWarnings: true
                compiler_num_of_cores: 1
                compiler_overlap_split_factor: 1
                compiler_perfWarnings: True
                runtime_num_activations: 14
                runtime_threads_per_queue: 4
             quantizer_params:
                bias_bitwidth: 32
                float_bias_bitwidth: 32
                act_quantizer_calibration: percentile
                act_quantizer_schema: asymmetric
                param_quantizer_schema: symmetric
                percentile_calibration_value: 99.998
                use_per_channel_quantization: True
                preserve_io_datatype: True
             netrun_params:
                use_native_input_data: True
                use_native_output_data: True
     inputs_info:
         - input.1:
             type: float32
             shape: ["*", 3, 299, 299]
     outputs_info:
         - _666:
             type: float32
             shape: ["*", 1000]
 metrics:
     transformations:
     - plugin:
             name: topk
             params:
                 kval: 1,5
                 round: 7

efficientNet_b0_config.yaml is provided to retain few nodes in higher precision to maintain accuracy

efficientNet_b0_config.json

{
 "activation_encodings": {
     "_380": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_381": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_417": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_418": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_398": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_399": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_367": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_368": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_366": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_364": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_365": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_384": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_382": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_383": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_362": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_363": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_387": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_388": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_385": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_386": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_454": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ],
     "_455": [
         {
             "bitwidth": 16,
             "dtype": "float"
         }
     ]
 },
 "param_encodings": {}
}

Accuracy Debugger¶

Refer to qairt-accuracy-debugger section in qairt-accuracy-debugger-(beta)

It is supported only Non-LLMs for Cloud AI backend

Example on Cloud AI Backend¶

Example of running EfficientNet_b0 model with fp16 precision with Accuracy Evaluator

EfficientNetB0 Framework_Runner stage to get intermediate ouputs with onnx

Model efficientnet-b0.onnx
For input tensor “‘input.1’ 1,3,224,224 input.raw float32”
Output tensor “666”

qairt-accuracy-debugger --framework_runner --framework onnx --model_path efficientnet-b0.onnx --working_dir RUNNER_WORKING_DIR --output_dirname fw_runner --disable_graph_optimization --verbose --input_tensor 'input.1' 1,3,224,224 input.raw float32 --output_tensor '666'

EfficientNetB0 Inference_Engine stage for INT8 precision to get the intermedicate output with QAIRT engine with Cloud AI as backend

For runtime “aic” backend(Cloud AI)
Host type x86
For input tensor “‘input.1’ 1,3,224,224 input.raw float32”
Output tensor “666”
Symmetric quantization to be used parameter and activations

qairt-accuracy-debugger --inference_engine --model_path efficientnet-b0.onnx --runtime aic --architecture x86_64-linux-clang --input_list qnn_efficientNet_b0_list.txt --calibration_input_list qnn_efficientNet_b0_list.txt --working_dir INF_WORKING_DIR --output_dirname InferenceResults --executor_type QAIRT --engine_path SDK_PATH --verbose --host_device x86 --profiling_level basic --log_level error --debug_mode_off --bias_bitwidth 32 --param_quantizer_schema symmetric --act_quantizer_schema symmetric --param_quantizer_calibration min-max --use_per_channel_quantization --input_tensor 'input.1' 1,3,224,224 input.raw float32 --output_tensor '666'  "

Input list qnn_efficientNet_b0_list.txt

input:=/PATH_TO_MODEL_INPUT/model-inputs/inputs/224x224/batch_size_1/./input.raw

EfficientNetB0 Verification stage for Int8 precision to compare outputs generated by framework runner (onnx) and Inference Engine(cloud AI backend) using CosineSimilarity

Use verifier as CosineSimilarity
Outputs will be available at WORKING_DIR

qairt-accuracy-debugger --verification --default_verifier CosineSimilarity --working_dir WORKING_DIR --verbose --golden_output_reference_directory RUNNER_WORKING_DIR --inference_results INF_WORKING_DIR/InferenceResults/output/Result_0/'

Hyper-Tuner¶

qnn-hypertuner (Experimental)¶

qnn-hypertuner, also referred to as Hypertuner, is a performance tuning tool that provides an optimal combination of compiler parameters. The Hypertuner takes, as input, a JSON configuration file containing the name of the deep learning model, hyper parameters, search algorithm, and backend. It then performs a search over the hyperspace as defined by the input parameters and outputs an optimal parameter set for use by downstream tasks or applications.

qnn-hypertuner is an experimental tool which is currently supported only on limited set of mobile devices and auto platforms.

Setup¶

Hypertuner usage assumes general setup instructions have been followed.

Usage¶

Hypertuner can be used in the following modes.

Search mode – Tunes deep learning model for performance
Plot mode – Visualizes statistics generated in Search mode

Run qnn-hypertuner --help to see the command line help message.

Search mode¶

Hypertuner can be run in Search mode by entering the search option in the command line.

qnn-hypertuner search -h
qnn-hypertuner search -i INPUT [-l] [-bo] [-an] [-ao] [-cp0] [-h]

Required arguments¶

-i, --input INPUT

Configuration file in JSON format. The configuration file must contain the following.

Deep learning model

Hyper parameters

Search Algorithm

Backend

Refer to the sample configuration files included in the SDK for details.

Optional arguments¶

Optional arguments can be used to provide additional information or to override the configurations provided by the input configuration file. Some of the options are shown below. Please run help command to view latest options.

-an, –algorithm-name

Algorithm name; options:

dopt
evol
brute

-ao, –algorithm-obj

Algorithm objective; options:

-l

Logging levels; options:

DBG
INFO
WARN
ERR

-cp0

Disable check-pointing

Optional Backend arguments¶

These options are displayed when –input flag is provided along with –help. They are backend specific options which are displayed after extracting backend name from –input config.

QNN-AIC

qnn-aic backend's flag:
  -bo {ips,latency,avg_ddr_bw}, --backend-obj {ips,latency,avg_ddr_bw}

QNN-HTP

qnn-htp backend's flag:
  -bo {exe_time,ddr_bandwidth}, --backend-obj {exe_time,ddr_bandwidth}

QNN-HTPMCP

qnn-htpmcp backend's flag:
  -bo {exe_time,ddr_bandwidth}, --backend-obj {exe_time,ddr_bandwidth}

Hextimate

hextimate backend's flag:
  -bo {exec_cycles,exec_cycles_lower,exec_cycles_upper,ddr_bandwidth}, --backend-obj {exec_cycles,exec_cycles_lower,exec_cycles_upper,ddr_bandwidth}

Sample commands¶

Please follow the Tutorials for downloading Inception V3 Tensorflow model file and the sample images before executing the sample commands.

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_aic.json

HTP (Mobile)

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htp.json

HTP (QDrive)

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htp_qd.json

HTPMCP (QDrive)

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htpmcp_qd.json

Hextimate

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_hextimate.json

Sample configurations¶

Configuration files have the following sections.

Hyperparameters – Specifies the list of parameters and the range for the search
Initial points – Specifies initial points used to start the search
Algorithm – Specifies the search algorithm; currently supported algorithms are:
- DOpt (Discrete Optimization)
- Evol (Evolutionary Tuning)
- Brute (Brute Force)
Backend – Specifies the backend and the name of the deep learning model; currently supported backends are:
- QNN-AIC
- QNN-HTP
- QNN-HTPMCP
- QNN-HEXTIMATE

Refer to the sample configuration files included in the SDK for details.

Output¶

Hypertuner generates the following outputs.

Set of optimal parameters with their respective values
Checkpoint file used to resume an earlier run
Hypertuner logs (history.log)

Plot mode¶

Hypertuner can be used for visualization by entering the plot option in the command line.

qnn-hypertuner plot -h
qnn-hypertuner plot -i input_file

The history.log file generated by running Hypertuner in Search mode can be used as the input for Plot mode.

qnn-hypertuner plot -i history.log

To generate a host profile, set the HT_ENABLE_PROFILING environment variable when running qnn-hypertuner in Search mode as shown below.

HT_ENABLE_PROFILING=1 qnn-hypertuner search -i example_qnn_aic.json
qnn-hypertuner plot -i history.log

Different views can be generated by selecting different values from the drop-down list

Constraints¶

Result will converge to Local Maxima/Minima
Run to run variations are expected on hardware backends
DOpt algorithm is not suitable for categorical hyperparameters