AIMET ONNX Layer Output Generation API

This API captures and saves intermediate layer-outputs of a model. The model can be original(FP32) or quantsim. The layer-outputs are named according to the exported ONNX model by the quantsim export API. This allows layer-output comparison amongst FP32 model, quantization simulated model and actually quantized model on target-device to debug accuracy miss-match issues.

Top-level API


The following API can be used to Generate Layer Outputs


Code Example

Imports

import numpy as np
from aimet_onnx.quantsim import QuantizationSimModel
from aimet_onnx.layer_output_utils import LayerOutputUtil

Obtain Original or QuantSim model session

# Obtain original model
original_model = Model()

# Obtain quantsim model
input_shape = (1, 3, 224, 224)
dummy_data = np.random.randn(*input_shape).astype(np.float32)
input_dict = {'input': dummy_data}

def forward_pass(session, input_dict):
    session.run(None, input_dict)

quantsim = QuantizationSimModel(model=original_model, dummy_input=input_dict, use_cuda=False)
quantsim.compute_encodings(forward_pass, input_dict)

Obtain pre-processed inputs

# Get the inputs that are pre-processed using the same manner while computing quantsim encodings in numpy ndarray
input_batches = get_pre_processed_inputs()

Generate Layer Outputs

# Generate layer-outputs
layer_output_util = LayerOutputUtil(model=quantsim.model.model, dir_path='./layer_output_dump')
for input_batch in input_batches:
    layer_output_util.generate_layer_outputs(input_batch)