AIMET Keras Layer Output Generation API¶

This API captures and saves intermediate layer-outputs of a model. The model can be original(FP32) or quantsim. The layer-outputs are named according to the exported Keras model by the quantsim export API. This allows layer-output comparison amongst FP32 model, quantization simulated model and actually quantized model on target-device to debug accuracy miss-match issues.

Top-level API¶

class aimet_tensorflow.keras.layer_output_utils.LayerOutputUtil(model, save_dir='./KerasLayerOutput')[source]¶

This class captures output of every layer of a keras (fp32/quantsim) model, creates a layer-output name to layer-output dictionary and saves the per layer outputs

Constructor - It initializes a few things that are required for capturing and naming layer-outputs. :type model: Model :param model: Keras (fp32/quantsim) model. :type save_dir: str :param save_dir: Directory to save the layer outputs.

The following API can be used to Generate Layer Outputs

LayerOutputUtil.generate_layer_outputs(input_batch)[source]¶

This method captures output of every layer of a keras model & saves the inputs and corresponding layer-outputs to disk. This allows layer-output comparison either between original fp32 model and quantization simulated model or quantization simulated model and actually quantized model on-target to debug accuracy miss-match issues.

Parameters: input_batch (Union[Tensor, List[Tensor], Tuple[Tensor]]) – Batch of Inputs for which layer output need to be generated
Returns: None

Code Example¶

Imports

import numpy as np
import tensorflow as tf

from aimet_tensorflow.keras.quantsim import QuantizationSimModel
from aimet_tensorflow.keras.layer_output_utils import LayerOutputUtil

Obtain Original or QuantSim model session

def quantsim_forward_pass_callback(model, dummy_input):
    _ = model.predict(dummy_input)

# Load the baseline/original (FP32) model
base_model = load_baseline_model()

dummy_input = np.random.rand(1, 16, 16, 3)

# Create QuantizationSim Object
quantsim_obj = QuantizationSimModel(
    model=base_model,
    quant_scheme='tf_enhanced',
    rounding_mode="nearest",
    default_output_bw=8,
    default_param_bw=8,
    in_place=False,
    config_file=None
)

# Compute encodings
quantsim_obj.compute_encodings(quantsim_forward_pass_callback,
                      forward_pass_callback_args=dummy_input
                      )

Obtain pre-processed inputs

# Get the inputs that are pre-processed using the same manner while computing quantsim encodings
input_batches = get_pre_processed_inputs()

Generate Layer Outputs

# Generate layer-outputs
layer_output_util = LayerOutputUtil(model=quantsim_obj.model, save_dir="./KerasLayerOutput")
for input_batch in input_batches:
    layer_output_util.generate_layer_outputs(input_batch=input_batch)