aimet_tensorflow.quantsim¶
- class aimet_tensorflow.keras.quantsim.QuantizationSimModel(model, quant_scheme='tf_enhanced', rounding_mode='nearest', default_output_bw=8, default_param_bw=8, in_place=False, config_file=None, default_data_type=QuantizationDataType.int)[source]¶
Implements mechanism to add quantization simulations ops to a model. This allows for off-target simulation of inference accuracy. Also allows the model to be fine-tuned to counter the effects of quantization.
- Parameters:
model – Model to quantize
quant_scheme (
Union
[QuantScheme
,str
]) – Quantization Scheme, currently supported schemes are post_training_tf and post_training_tf_enhanced, defaults to post_training_tf_enhancedrounding_mode (
str
) – The round scheme to used. One of: ‘nearest’ or ‘stochastic’, defaults to ‘nearest’.default_output_bw (
int
) – bitwidth to use for activation tensors, defaults to 8default_param_bw (
int
) – bitwidth to use for parameter tensors, defaults to 8in_place (
bool
) – If True, then the given ‘model’ is modified in-place to add quant-sim nodes. Only suggested use of this option is when the user wants to avoid creating a copy of the modelconfig_file (
Optional
[str
]) – Path to a config file to use to specify rules for placing quant ops in the modeldefault_data_type (
QuantizationDataType
) – Default data type to use for quantizing all layer parameters. Possible options are QuantizationDataType.int and QuantizationDataType.float. Note that the mode default_data_type=QuantizationDataType.float is only supported with default_output_bw=16 and default_param_bw=16
The following API can be used to Compute Encodings for Model
- QuantizationSimModel.compute_encodings(forward_pass_callback, forward_pass_callback_args)[source]¶
Computes encodings for all quantization sim nodes in the model.
- Parameters:
forward_pass_callback – A callback function that is expected to run forward passes on a model. This callback function should use representative data for the forward pass, so the calculated encodings work for all data samples.
forward_pass_callback_args – These argument(s) are passed to the forward_pass_callback as-is. Up to the user to determine the type of this parameter. E.g. could be simply an integer representing the number of data samples to use. Or could be a tuple of parameters or an object representing something more complex.
The following API can be used to Export the Model to target
- QuantizationSimModel.export(path, filename_prefix, custom_objects=None, convert_to_pb=True)[source]¶
This method exports out the quant-sim model so it is ready to be run on-target. Specifically, the following are saved
The sim-model is exported to a regular Keras model without any simulation ops
The quantization encodings are exported to a separate JSON-formatted file that can then be imported by the on-target runtime (if desired)
- Parameters:
path – path where to store model pth and encodings
filename_prefix – Prefix to use for filenames of the model pth and encodings files
custom_objects – If there are custom objects to load, Keras needs a dict of them to map them
Enum Definition¶
Quant Scheme Enum
- class aimet_common.defs.QuantScheme(value)[source]¶
Enumeration of Quant schemes
- post_training_percentile = 6¶
For a Tensor, adjusted minimum and maximum values are selected based on the percentile value passed. The Quantization encodings are calculated using the adjusted minimum and maximum value.
- post_training_tf = 1¶
For a Tensor, the absolute minimum and maximum value of the Tensor are used to compute the Quantization encodings.
- post_training_tf_enhanced = 2¶
For a Tensor, searches and selects the optimal minimum and maximum value that minimizes the Quantization Noise. The Quantization encodings are calculated using the selected minimum and maximum value.
- training_range_learning_with_tf_enhanced_init = 4¶
For a Tensor, the encoding values are initialized with the post_training_tf_enhanced scheme. Then, the encodings are learned during training.
- training_range_learning_with_tf_init = 3¶
For a Tensor, the encoding values are initialized with the post_training_tf scheme. Then, the encodings are learned during training.