QAIRT Tools

Accuracy Evaluator

Refer to qairt-accuracy-evaluator section in qairt-accuracy-evaluator-(beta)

This tool can be used only for Non-LLMs for Cloud AI backend.

Example on Cloud AI Backend

Example of running EfficientNet_b0 model with fp16 precision with Accuracy Evaluator

  • With config efficientNet_b0_config.yaml

  • Cleanup intermediate files

  • On device id 5

  • Run for inference with schema tag “qnn_fp16”

qairt-accuracy-evaluator -config efficientNet_b0_config.yaml -cleanup intermediate -inference_schema_tag qnn_fp16 -work_dir WORKING_DIR_PATH -silent -device_id 5
  • efficientNet_b0_config.yaml

    efficientNet_b0_config.yaml
    model:
     info:
         desc: EfficientNet-b0 reference public model used.
         batchsize: 1
     globals:
         count: -1
         calib: -1
         npi_file: efficientNet_b0_config.json
     dataset:
         name: ILSVRC2012
         path: '/home/ml-datasets/imageNet/'
         inputlist_file: inputlist.txt
         annotation_file: ground_truth.txt
         calibration:
             type: dataset
             file: calibration.txt
         transformations:
             - plugin:
                 name: filter_dataset
                 params:
                     random: False
                     max_inputs: $count
                     max_calib: $calib
     preprocessing:
         transformations:
             - plugin:
                     name: resize
                     params:
                         library: torchvision
                         dims: 224, 224
                         interp: bicubic
                         type: imagenet
                         typecasting_required: False
             - plugin:
                     name: crop
                     params:
                         library: torchvision
                         dims: 224,224
                         typecasting_required: False
             - plugin:
                     name: normalize
                     params:
                         library: torchvision
                         normalize_first: True
                         means:
                             R: 0.485
                             G: 0.456
                             B: 0.406
                         std:
                             R: 0.229
                             G: 0.224
                             B: 0.225
             - plugin:
                     name: create_batch
     inference-engine:
         model_path: public/efficientnet/efficientnet-b0.onnx
         inference_schemas:
             - inference_schema:
                 name: qnn
                 precision: fp32
                 target_arch: x86_64-linux-clang
                 backend: cpu
                 tag: qnn_fp32,ci
             - inference_schema:
                 name: qnn
                 precision: fp16
                 target_arch: x86_64-linux-clang
                 backend: aic
                 tag: qnn_fp16,ci
                 backend_extensions:
                    compiler_perfWarnings: True
                 netrun_params:
                    use_native_input_data: True
                    use_native_output_data: True
                 converter_params:
                    float_bitwidth: 16
                    float_bias_bitwidth: 32
                    preserve_io_datatype: True
             - inference_schema:
                 name: qnn
                 precision: quant
                 target_arch: x86_64-linux-clang
                 backend: aic
                 tag: qnn_int8_mp,ci
                 converter_params:
                    float_bias_bitwidth: 32
                    quantization_overrides: $npi_file
                 backend_extensions:
                    compiler_perfWarnings: true
                    compiler_num_of_cores: 1
                    compiler_overlap_split_factor: 1
                    compiler_perfWarnings: True
                    runtime_num_activations: 14
                    runtime_threads_per_queue: 4
                 quantizer_params:
                    bias_bitwidth: 32
                    float_bias_bitwidth: 32
                    act_quantizer_calibration: percentile
                    act_quantizer_schema: asymmetric
                    param_quantizer_schema: symmetric
                    percentile_calibration_value: 99.998
                    use_per_channel_quantization: True
                    preserve_io_datatype: True
                 netrun_params:
                    use_native_input_data: True
                    use_native_output_data: True
         inputs_info:
             - input.1:
                 type: float32
                 shape: ["*", 3, 299, 299]
         outputs_info:
             - _666:
                 type: float32
                 shape: ["*", 1000]
     metrics:
         transformations:
         - plugin:
                 name: topk
                 params:
                     kval: 1,5
                     round: 7
    
  • efficientNet_b0_config.yaml is provided to retain few nodes in higher precision to maintain accuracy

    efficientNet_b0_config.json
    {
     "activation_encodings": {
         "_380": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_381": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_417": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_418": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_398": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_399": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_367": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_368": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_366": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_364": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_365": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_384": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_382": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_383": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_362": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_363": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_387": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_388": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_385": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_386": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_454": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ],
         "_455": [
             {
                 "bitwidth": 16,
                 "dtype": "float"
             }
         ]
     },
     "param_encodings": {}
    }
    

Accuracy Debugger

Refer to qairt-accuracy-debugger section in qairt-accuracy-debugger-(beta)

It is supported only Non-LLMs for Cloud AI backend

Example on Cloud AI Backend

Example of running EfficientNet_b0 model with fp16 precision with Accuracy Evaluator

  • EfficientNetB0 Framework_Runner stage to get intermediate ouputs with onnx

  • Model efficientnet-b0.onnx

  • For input tensor “‘input.1’ 1,3,224,224 input.raw float32”

  • Output tensor “666”

qairt-accuracy-debugger --framework_runner --framework onnx --model_path efficientnet-b0.onnx --working_dir RUNNER_WORKING_DIR --output_dirname fw_runner --disable_graph_optimization --verbose --input_tensor 'input.1' 1,3,224,224 input.raw float32 --output_tensor '666'
  • EfficientNetB0 Inference_Engine stage for INT8 precision to get the intermedicate output with QAIRT engine with Cloud AI as backend

  • For runtime “aic” backend(Cloud AI)

  • Host type x86

  • For input tensor “‘input.1’ 1,3,224,224 input.raw float32”

  • Output tensor “666”

  • Symmetric quantization to be used parameter and activations

qairt-accuracy-debugger --inference_engine --model_path efficientnet-b0.onnx --runtime aic --architecture x86_64-linux-clang --input_list qnn_efficientNet_b0_list.txt --calibration_input_list qnn_efficientNet_b0_list.txt --working_dir INF_WORKING_DIR --output_dirname InferenceResults --executor_type QAIRT --engine_path SDK_PATH --verbose --host_device x86 --profiling_level basic --log_level error --debug_mode_off --bias_bitwidth 32 --param_quantizer_schema symmetric --act_quantizer_schema symmetric --param_quantizer_calibration min-max --use_per_channel_quantization --input_tensor 'input.1' 1,3,224,224 input.raw float32 --output_tensor '666'  "
  • Input list qnn_efficientNet_b0_list.txt

input:=/PATH_TO_MODEL_INPUT/model-inputs/inputs/224x224/batch_size_1/./input.raw
  • EfficientNetB0 Verification stage for Int8 precision to compare outputs generated by framework runner (onnx) and Inference Engine(cloud AI backend) using CosineSimilarity

  • Use verifier as CosineSimilarity

  • Outputs will be available at WORKING_DIR

qairt-accuracy-debugger --verification --default_verifier CosineSimilarity --working_dir WORKING_DIR --verbose --golden_output_reference_directory RUNNER_WORKING_DIR --inference_results INF_WORKING_DIR/InferenceResults/output/Result_0/'

Hyper-Tuner

qnn-hypertuner (Experimental)

qnn-hypertuner, also referred to as Hypertuner, is a performance tuning tool that provides an optimal combination of compiler parameters. The Hypertuner takes, as input, a JSON configuration file containing the name of the deep learning model, hyper parameters, search algorithm, and backend. It then performs a search over the hyperspace as defined by the input parameters and outputs an optimal parameter set for use by downstream tasks or applications.

qnn-hypertuner is an experimental tool which is currently supported only on limited set of mobile devices and auto platforms.

Setup

Hypertuner usage assumes general setup instructions have been followed.

Usage

Hypertuner can be used in the following modes.

  • Search mode – Tunes deep learning model for performance

  • Plot mode – Visualizes statistics generated in Search mode

Run qnn-hypertuner --help to see the command line help message.

Search mode

Hypertuner can be run in Search mode by entering the search option in the command line.

qnn-hypertuner search -h
qnn-hypertuner search -i INPUT [-l] [-bo] [-an] [-ao] [-cp0] [-h]

Required arguments

-i, --input INPUT

Configuration file in JSON format. The configuration file must contain the following.

  • Deep learning model

  • Hyper parameters

  • Search Algorithm

  • Backend

Refer to the sample configuration files included in the SDK for details.

Optional arguments

Optional arguments can be used to provide additional information or to override the configurations provided by the input configuration file. Some of the options are shown below. Please run help command to view latest options.

-an, –algorithm-name

Algorithm name; options:

  • dopt

  • evol

  • brute

-ao, –algorithm-obj

Algorithm objective; options:

  • max

  • min

-l

Logging levels; options:

  • DBG

  • INFO

  • WARN

  • ERR

-cp0

Disable check-pointing

Optional Backend arguments

These options are displayed when –input flag is provided along with –help. They are backend specific options which are displayed after extracting backend name from –input config.

  • QNN-AIC

qnn-aic backend's flag:
  -bo {ips,latency,avg_ddr_bw}, --backend-obj {ips,latency,avg_ddr_bw}
  • QNN-HTP

qnn-htp backend's flag:
  -bo {exe_time,ddr_bandwidth}, --backend-obj {exe_time,ddr_bandwidth}
  • QNN-HTPMCP

qnn-htpmcp backend's flag:
  -bo {exe_time,ddr_bandwidth}, --backend-obj {exe_time,ddr_bandwidth}
  • Hextimate

hextimate backend's flag:
  -bo {exec_cycles,exec_cycles_lower,exec_cycles_upper,ddr_bandwidth}, --backend-obj {exec_cycles,exec_cycles_lower,exec_cycles_upper,ddr_bandwidth}

Sample commands

Please follow the Tutorials for downloading Inception V3 Tensorflow model file and the sample images before executing the sample commands.

  • AIC

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_aic.json
  • HTP (Mobile)

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htp.json
  • HTP (QDrive)

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htp_qd.json
  • HTPMCP (QDrive)

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_htpmcp_qd.json
  • Hextimate

qnn-hypertuner search -i ${QNN_SDK_ROOT}/examples/QNN/hypertuner/example_qnn_hextimate.json

Sample configurations

Configuration files have the following sections.

  • Hyperparameters – Specifies the list of parameters and the range for the search

  • Initial points – Specifies initial points used to start the search

  • Algorithm – Specifies the search algorithm; currently supported algorithms are:
    • DOpt (Discrete Optimization)

    • Evol (Evolutionary Tuning)

    • Brute (Brute Force)

  • Backend – Specifies the backend and the name of the deep learning model; currently supported backends are:
    • QNN-AIC

    • QNN-HTP

    • QNN-HTPMCP

    • QNN-HEXTIMATE

Refer to the sample configuration files included in the SDK for details.

Output

Hypertuner generates the following outputs.

  • Set of optimal parameters with their respective values

  • Checkpoint file used to resume an earlier run

  • Hypertuner logs (history.log)

Plot mode

Hypertuner can be used for visualization by entering the plot option in the command line.

qnn-hypertuner plot -h
qnn-hypertuner plot -i input_file

The history.log file generated by running Hypertuner in Search mode can be used as the input for Plot mode.

qnn-hypertuner plot -i history.log

To generate a host profile, set the HT_ENABLE_PROFILING environment variable when running qnn-hypertuner in Search mode as shown below.

HT_ENABLE_PROFILING=1 qnn-hypertuner search -i example_qnn_aic.json
qnn-hypertuner plot -i history.log

Different views can be generated by selecting different values from the drop-down list

Constraints

  1. Result will converge to Local Maxima/Minima

  2. Run to run variations are expected on hardware backends

  3. DOpt algorithm is not suitable for categorical hyperparameters