AutoQuant

This notebook contains an example of how to use AIMET AutoQuant feature.

AIMET offers a suite of neural network post-training quantization (PTQ) techniques that can be applied in succession. However, finding the right sequence of techniques to apply is time-consuming and can be challenging for non-expert users. We instead recommend AutoQuant to save time and effort.

AutoQuant is an API that analyzes the model and automatically applies various PTQ techniques based on best-practices heuristics. You specify a tolerable accuracy drop, and AutoQuant applies PTQ techniques cumulatively until the target accuracy is satisfied.

Overall flow

This example performs the following steps:

  1. Define constants and helper functions

  2. Load a pretrained FP32 model

  3. Run AutoQuant

Note

This notebook does not show state-of-the-art results. For example, it uses a relatively quantization-friendly model (Resnet18). Also, some optimization parameters like number of fine-tuning epochs are chosen to improve execution speed in the notebook.


Dataset

This example does image classification on the ImageNet dataset. If you already have a version of the data set, use that. Otherwise download the data set, for example from https://image-net.org/challenges/LSVRC/2012/index .

Note

The dataloader provided in this example relies on these features of the ImageNet data set:

  • Subfolders train for the training samples and val for the validation samples. See the pytorch dataset description for more details.

  • One subdirectory per class, and one file per image sample.

Note

To speed up the execution of this notebook, you can use a reduced subset of the ImageNet dataset. For example: The entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. However, for the purpose of running this notebook, you can reduce the dataset to, say, two samples per class.

Edit the cell below to specify the directory where the downloaded ImageNet dataset is saved.

[ ]:
import os
from torchvision import transforms, datasets

DATASET_DIR = '/path/to/dataset'   # Replace this path with a real directory

val_transforms = transforms.Compose([
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])

imagenet_dataset = datasets.ImageFolder(root=os.path.join(DATASET_DIR, 'val'), transform=val_transforms)

1. Define Constants and Helper functions

This section defines the following constants and helper functions:

  • EVAL_DATASET_SIZE A typical value is 5000. In this example, the value has been set to 500 for faster execution.

  • CALIBRATION_DATASET_SIZE A typical value is 2000. In this example, the value has been set to 200 for faster execution.

  • **_create_sampled_data_loader()** returns a DataLoader based on the dataset and the number of samples provided.

  • eval_callback() defines an evaluation function for the model.

[ ]:
import random
from typing import Optional
from tqdm import tqdm
import torch
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler, Subset
from aimet_torch.utils import in_eval_mode, get_device

EVAL_DATASET_SIZE = 500
CALIBRATION_DATASET_SIZE = 200

_datasets = {}

def _create_sampled_data_loader(dataset, num_samples):
    if num_samples not in _datasets:
        indices = random.sample(range(len(dataset)), num_samples)
        _datasets[num_samples] = Subset(dataset, indices)
    return DataLoader(_datasets[num_samples], batch_size=32)


def eval_callback(model: torch.nn.Module, num_samples: Optional[int] = None) -> float:
    if num_samples is None:
        num_samples = EVAL_DATASET_SIZE

    data_loader = _create_sampled_data_loader(imagenet_dataset, num_samples)
    device = get_device(model)

    correct = 0
    with in_eval_mode(model), torch.no_grad():
        for image, label in tqdm(data_loader):
            image = image.to(device)
            label = label.to(device)
            logits = model(image)
            top1 = logits.topk(k=1).indices
            correct += (top1 == label.view_as(top1)).sum()

    return int(correct) / num_samples

2. Load a pretrained FP32 model

Load a pretrained resnet18 model from torchvision.

You can load any pretrained PyTorch model instead.

[ ]:
from torchvision.models import resnet18

model = resnet18(pretrained=True).eval()

if torch.cuda.is_available():
    model.to(torch.device('cuda'))

accuracy = eval_callback(model)
print(f'- FP32 accuracy: {accuracy}')

3. Run AutoQuant

3.1 Create an AutoQuant object.

The AutoQuant feature uses an unlabeled dataset to quantize the model. The UnlabeledDatasetWrapper class creates an unlabeled Dataset object from a labeled Dataset.

[ ]:
from aimet_torch.v1.auto_quant import AutoQuant

class UnlabeledDatasetWrapper(Dataset):
    def __init__(self, dataset):
        self._dataset = dataset

    def __len__(self):
        return len(self._dataset)

    def __getitem__(self, index):
        images, _ = self._dataset[index]
        return images


unlabeled_imagenet_dataset = UnlabeledDatasetWrapper(imagenet_dataset)
unlabeled_imagenet_data_loader = _create_sampled_data_loader(unlabeled_imagenet_dataset,
                                                             CALIBRATION_DATASET_SIZE)

dummy_input = torch.randn((1, 3, 224, 224)).to(get_device(model))

auto_quant = AutoQuant(model,
                       dummy_input=dummy_input,
                       data_loader=unlabeled_imagenet_data_loader,
                       eval_callback=eval_callback)

3.2 Run AutoQuant inference.

AutoQuant inference uses the eval_callback with the generic quantized model without applying PTQ techniques. This provides a baseline evaluation score before running AutoQuant optimization.

[ ]:
sim, initial_accuracy = auto_quant.run_inference()
print(f"- Quantized Accuracy (before optimization): {initial_accuracy}")

3.3 Set AdaRound Parameters (optional).

AutoQuant uses predefined default parameters for AdaRound. These values were determined empirically and work well with the common models.

If necessary, you can use custom parameters for Adaround. This example uses very small AdaRound parameters for faster execution.

[ ]:
from aimet_torch.v1.adaround.adaround_weight import AdaroundParameters

ADAROUND_DATASET_SIZE = 200
adaround_data_loader = _create_sampled_data_loader(unlabeled_imagenet_dataset, ADAROUND_DATASET_SIZE)
adaround_params = AdaroundParameters(adaround_data_loader, num_batches=len(adaround_data_loader), default_num_iterations=2000)
auto_quant.set_adaround_params(adaround_params)

3.4 Run AutoQuant Optimization.

This step runs AutoQuant optimization. AutoQuant returns the following:

  • The best possible quantized model

  • The corresponding evaluation score

  • The path to the encoding file

The allowed_accuracy_drop indicates the tolerable accuracy drop. AutoQuant applies a series of quantization features until the target accuracy (FP32 accuracy - allowed accuracy drop) is satisfied. When the target accuracy is reached, AutoQuant returns immediately without applying furhter PTQ techniques. See the AutoQuant User Guide and AutoQuant API documentation for details.

[ ]:
model, optimized_accuracy, encoding_path = auto_quant.optimize(allowed_accuracy_drop=0.01)
print(f"- Quantized Accuracy (after optimization):  {optimized_accuracy}")

Next steps

The next step is to export this model for installation on the target.

Export the model and encodings.

  • Export the model with the updated weights but without the fake quant ops.

  • Export the encodings (scale and offset quantization parameters). AIMET QuantizationSimModel provides an export API for this purpose.

The following code performs these exports.

[ ]:
os.makedirs('./output/', exist_ok=True)
dummy_input = dummy_input.cpu()
sim.export(path='./output/', filename_prefix='resnet18_after_cle_bc', dummy_input=dummy_input)

For more information

See the AIMET API docs for details about the AIMET APIs and optional parameters.

See the other example notebooks to learn how to use other AIMET post-training quantization techniques.