AIMET PyTorch Quantization SIM API

AIMET Quantization Sim requires the model definitions to use certain constructs and avoid others. These constraints are described in detail here.

AIMET also includes a Model Validator tool to allow the users to check their model definition and find constructs that might need to be replaced. Please see the API and usage examples for this tool also on the same page.

Top-level API


The following API can be used to Compute Encodings for Model


The following APIs can be used to save and restore the quantized model



The following API can be used to Export the Model to target


Encoding format is described in the Quantization Encoding Specification


Enum Definition

Quant Scheme Enum

class aimet_common.defs.QuantScheme

Enumeration of Quant schemes

post_training_tf = 1

Tf scheme

post_training_tf_enhanced = 2

Tf- enhanced scheme


Code Examples

Required imports

import torch
from aimet_torch.examples import mnist_torch_model
# Quantization related import
from aimet_torch.quantsim import QuantizationSimModel

Evaluation function

def evaluate_model(model: torch.nn.Module, eval_iterations: int, use_cuda: bool = False) -> float:
    """
    This is intended to be the user-defined model evaluation function.
    AIMET requires the above signature. So if the user's eval function does not
    match this signature, please create a simple wrapper.

    Note: Honoring the number of iterations is not absolutely necessary.
    However if all evaluations run over an entire epoch of validation data,
    the runtime for AIMET compression will obviously be higher.

    :param model: Model to evaluate
    :param eval_iterations: Number of iterations to use for evaluation.
            None for entire epoch.
    :param use_cuda: If true, evaluate using gpu acceleration
    :return: single float number (accuracy) representing model's performance
    """
    return .5

Quantize and fine-tune a trained model

def quantize_model(trainer_function):

    model = mnist_torch_model.Net().to(torch.device('cuda'))

    sim = QuantizationSimModel(model, default_output_bw=8, default_param_bw=8, dummy_input=torch.rand(1, 1, 28, 28),
                               config_file='../../../TrainingExtensions/common/src/python/aimet_common/quantsim_config/'
                                           'default_config.json')

    # Quantize the untrained MNIST model
    sim.compute_encodings(forward_pass_callback=evaluate_model, forward_pass_callback_args=5)

    # Fine-tune the model's parameter using training
    trainer_function(model=sim.model, epochs=1, num_batches=100, use_cuda=True)

    # Export the model
    sim.export(path='./', filename_prefix='quantized_mnist', dummy_input=torch.rand(1, 1, 28, 28))