AIMET Keras Quantization SIM API¶
User Guide Link¶
To learn more about Quantization Simulation, please see Quantization Sim
Top-level API¶
-
class
aimet_tensorflow.keras.quantsim.
QuantizationSimModel
(model, quant_scheme='tf_enhanced', rounding_mode='nearest', default_output_bw=8, default_param_bw=8, in_place=False, config_file=None, default_data_type=<QuantizationDataType.int: 1>)¶ Implements mechanism to add quantization simulations ops to a model. This allows for off-target simulation of inference accuracy. Also allows the model to be fine-tuned to counter the effects of quantization.
- Parameters
model – Model to quantize
quant_scheme (
Union
[QuantScheme
,str
]) – Quantization Scheme, currently supported schemes are post_training_tf and post_training_tf_enhanced, defaults to post_training_tf_enhancedrounding_mode (
str
) – The round scheme to used. One of: ‘nearest’ or ‘stochastic’, defaults to ‘nearest’.default_output_bw (
int
) – bitwidth to use for activation tensors, defaults to 8default_param_bw (
int
) – bitwidth to use for parameter tensors, defaults to 8in_place (
bool
) – If True, then the given ‘model’ is modified in-place to add quant-sim nodes. Only suggested use of this option is when the user wants to avoid creating a copy of the modelconfig_file (
Optional
[str
]) – Path to a config file to use to specify rules for placing quant ops in the modeldefault_data_type (
QuantizationDataType
) – Default data type to use for quantizing all layer parameters. Possible options are QuantizationDataType.int and QuantizationDataType.float. Note that the mode default_data_type=QuantizationDataType.float is only supported with default_output_bw=16 and default_param_bw=16
The following API can be used to Compute Encodings for Model
-
QuantizationSimModel.
compute_encodings
(forward_pass_callback, forward_pass_callback_args)¶ Computes encodings for all quantization sim nodes in the model.
- Parameters
forward_pass_callback – A callback function that is expected to runs forward passes on a model. This callback function should use representative data for the forward pass, so the calculated encodings work for all data samples.
forward_pass_callback_args – These argument(s) are passed to the forward_pass_callback as-is. Up to the user to determine the type of this parameter. E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of parameters or an object representing something more complex.
The following API can be used to Export the Model to target
-
QuantizationSimModel.
export
(path, filename_prefix)¶ This method exports out the quant-sim model so it is ready to be run on-target.
Specifically, the following are saved
The sim-model is exported to a regular Keras model without any simulation ops
The quantization encodings are exported to a separate JSON-formatted file that can then be imported by the on-target runtime (if desired)
- Parameters
path – path where to store model pth and encodings
filename_prefix – Prefix to use for filenames of the model pth and encodings files
Encoding format is described in the Quantization Encoding Specification
Code Examples¶
Required imports
import numpy as np
import tensorflow as tf
from aimet_tensorflow.keras import quantsim
Quantize with Fine tuning
def quantize_model():
model = tf.keras.applications.resnet50.ResNet50(weights=None, classes=10)
sim = quantsim.QuantizationSimModel(model)
# Generate some dummy data
dummy_x = np.random.randn(10, 224, 224, 3)
dummy_y = np.random.randint(0, 10, size=(10,))
dummy_y = tf.keras.utils.to_categorical(dummy_y, num_classes=10)
# Compute encodings
sim.model.compile(optimizer=tf.keras.optimizers.Adam(lr=0.001),loss='categorical_crossentropy',metrics=['accuracy'])
sim.compute_encodings(evaluate, forward_pass_callback_args=(dummy_x, dummy_y))
# Do some fine-tuning
sim.model.fit(x=dummy_x, y=dummy_y, epochs=10)