AIMET PyTorch Quantization SIM API¶

AIMET Quantization Sim requires the model definitions to use certain constructs and avoid others. These constraints are described in detail here.

AIMET also includes a Model Validator tool to allow the users to check their model definition and find constructs that might need to be replaced. Please see the API and usage examples for this tool also on the same page.

Top-level API¶

The following API can be used to Compute Encodings for Model

The following APIs can be used to save and restore the quantized model

The following API can be used to Export the Model to target

Encoding format is described in the Quantization Encoding Specification

Enum Definition¶

Quant Scheme Enum

class aimet_common.defs.QuantScheme¶

Enumeration of Quant schemes

post_training_tf = 1¶: Tf scheme

post_training_tf_enhanced = 2¶: Tf- enhanced scheme

Code Examples¶

Required imports

import torch
from aimet_torch.examples import mnist_torch_model
# Quantization related import
from aimet_torch.quantsim import QuantizationSimModel

Evaluation function

def evaluate_model(model: torch.nn.Module, eval_iterations: int, use_cuda: bool = False) -> float:
    """
    This is intended to be the user-defined model evaluation function.
    AIMET requires the above signature. So if the user's eval function does not
    match this signature, please create a simple wrapper.

    Note: Honoring the number of iterations is not absolutely necessary.
    However if all evaluations run over an entire epoch of validation data,
    the runtime for AIMET compression will obviously be higher.

    :param model: Model to evaluate
    :param eval_iterations: Number of iterations to use for evaluation.
            None for entire epoch.
    :param use_cuda: If true, evaluate using gpu acceleration
    :return: single float number (accuracy) representing model's performance
    """
    return .5

Quantize and fine-tune a trained model

def quantize_model(trainer_function):

    model = mnist_torch_model.Net().to(torch.device('cuda'))

    sim = QuantizationSimModel(model, default_output_bw=8, default_param_bw=8, dummy_input=torch.rand(1, 1, 28, 28),
                               config_file='../../../TrainingExtensions/common/src/python/aimet_common/quantsim_config/'
                                           'default_config.json')

    # Quantize the untrained MNIST model
    sim.compute_encodings(forward_pass_callback=evaluate_model, forward_pass_callback_args=5)

    # Fine-tune the model's parameter using training
    trainer_function(model=sim.model, epochs=1, num_batches=100, use_cuda=True)

    # Export the model
    sim.export(path='./', filename_prefix='quantized_mnist', dummy_input=torch.rand(1, 1, 28, 28))