AIMET ONNX Layer Output Generation API

This API captures and saves intermediate layer-outputs of a model. The model can be original (FP32) or quantsim. The layer-outputs are named according to the exported ONNX model by the quantsim export API. This allows layer-output comparison amongst FP32 model, quantization simulated model and actually quantized model on target-device to debug accuracy miss-match issues.

Top-level API


The following API can be used to Generate Layer Outputs


Code Example

Imports

import onnx
from onnxruntime import InferenceSession

from aimet_onnx.quantsim import QuantizationSimModel, load_encodings_to_sim
from aimet_onnx.layer_output_utils import LayerOutputUtil

Obtain Original or QuantSim model from AIMET Export Artifacts

# Load the model.
model = onnx.load('path/to/aimet_export_artifacts/model.onnx')

# Use same arguments as that were used for the exported QuantSim model. For sake of simplicity only mandatory arguments are passed below.
quantsim = QuantizationSimModel(model=model, dummy_input=dummy_input_dict, use_cuda=False)

# Load exported encodings into quantsim object
load_encodings_to_sim(quantsim, 'path/to/aimet_export_artifacts/model.encodings')

# Check whether constructed original and quantsim model are running properly before using Layer Output Generation API.
_ = InferenceSession(model.SerializeToString()).run(None, dummy_input_dict)
_ = quantsim.session.run(None, dummy_input_dict)

Obtain inputs for which we want to generate intermediate layer-outputs

# Use same input pre-processing pipeline as was used for computing the quantization encodings.
input_batches = get_pre_processed_inputs()

Generate layer-outputs

# Use original model to get fp32 layer-outputs
fp32_layer_output_util = LayerOutputUtil(model=model, dir_path='./fp32_layer_outputs')

# Use quantsim model to get quantsim layer-outputs
quantsim_layer_output_util = LayerOutputUtil(model=quantsim.model.model, dir_path='./quantsim_layer_outputs')

for input_batch in input_batches:
    fp32_layer_output_util.generate_layer_outputs(input_batch)
    quantsim_layer_output_util.generate_layer_outputs(input_batch)

# Note: Generate layer-outputs for fp32 model before creating quantsim model becuase the fp32 model itself is modified to get quantsim version.