AIMET PyTorch Quantization SIM API¶
AIMET Quantization Sim requires the model definitions to use certain constructs and avoid others. These constraints are described in detail here.
AIMET also includes a Model Validator tool to allow the users to check their model definition and find constructs that might need to be replaced. Please see the API and usage examples for this tool also on the same page.
Top-level API¶
The following API can be used to Compute Encodings for Model
The following APIs can be used to save and restore the quantized model
The following API can be used to Export the Model to target
Encoding format is described in the Quantization Encoding Specification
Enum Definition¶
Quant Scheme Enum
-
class
aimet_common.defs.
QuantScheme
¶ Enumeration of Quant schemes
-
post_training_tf
= 1¶ Tf scheme
-
post_training_tf_enhanced
= 2¶ Tf- enhanced scheme
-
Code Examples¶
Required imports
import torch
from aimet_torch.examples import mnist_torch_model
# Quantization related import
from aimet_torch.quantsim import QuantizationSimModel
Evaluation function
def evaluate_model(model: torch.nn.Module, eval_iterations: int, use_cuda: bool = False) -> float:
"""
This is intended to be the user-defined model evaluation function.
AIMET requires the above signature. So if the user's eval function does not
match this signature, please create a simple wrapper.
Note: Honoring the number of iterations is not absolutely necessary.
However if all evaluations run over an entire epoch of validation data,
the runtime for AIMET compression will obviously be higher.
:param model: Model to evaluate
:param eval_iterations: Number of iterations to use for evaluation.
None for entire epoch.
:param use_cuda: If true, evaluate using gpu acceleration
:return: single float number (accuracy) representing model's performance
"""
return .5
Quantize and fine-tune a trained model
def quantize_model(trainer_function):
model = mnist_torch_model.Net().to(torch.device('cuda'))
sim = QuantizationSimModel(model, default_output_bw=8, default_param_bw=8, dummy_input=torch.rand(1, 1, 28, 28),
config_file='../../../TrainingExtensions/common/src/python/aimet_common/quantsim_config/'
'default_config.json')
# Quantize the untrained MNIST model
sim.compute_encodings(forward_pass_callback=evaluate_model, forward_pass_callback_args=5)
# Fine-tune the model's parameter using training
trainer_function(model=sim.model, epochs=1, num_batches=100, use_cuda=True)
# Export the model
sim.export(path='./', filename_prefix='quantized_mnist', dummy_input=torch.rand(1, 1, 28, 28))