AutoQuant¶
This notebook contains an example of how to use AIMET AutoQuant feature.
AIMET offers a suite of neural network post-training quantization (PTQ) techniques that can be applied in succession. However, finding the right sequence of techniques to apply is time-consuming and can be challenging for non-expert users. We instead recommend AutoQuant to save time and effort.
AutoQuant is an API that analyzes the model and automatically applies various PTQ techniques based on best-practices heuristics. You specify a tolerable accuracy drop, and AutoQuant applies PTQ techniques cumulatively until the target accuracy is satisfied.
Overall flow¶
This example performs the following steps:
Define constants and helper functions
Load a pretrained FP32 model
Run AutoQuant
Note
This notebook does not show state-of-the-art results. For example, it uses a relatively quantization-friendly model (Resnet18). Also, some optimization parameters like number of fine-tuning epochs are chosen to improve execution speed in the notebook.
Dataset¶
This example does image classification on the ImageNet dataset. If you already have a version of the data set, use that. Otherwise download the data set, for example from https://image-net.org/challenges/LSVRC/2012/index .
Note
The dataloader provided in this example relies on these features of the ImageNet data set:
Subfolders
train
for the training samples andval
for the validation samples. See the pytorch dataset description for more details.One subdirectory per class, and one file per image sample.
Note
To speed up the execution of this notebook, you can use a reduced subset of the ImageNet dataset. For example: The entire ILSVRC2012 dataset has 1000 classes, 1000 training samples per class and 50 validation samples per class. However, for the purpose of running this notebook, you can reduce the dataset to, say, two samples per class.
Edit the cell below to specify the directory where the downloaded ImageNet dataset is saved.
[ ]:
import os
from torchvision import transforms, datasets
DATASET_DIR = '/path/to/dataset' # Replace this path with a real directory
val_transforms = transforms.Compose([
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize(mean=(0.485, 0.456, 0.406), std=(0.229, 0.224, 0.225)),
])
imagenet_dataset = datasets.ImageFolder(root=os.path.join(DATASET_DIR, 'val'), transform=val_transforms)
1. Define Constants and Helper functions¶
This section defines the following constants and helper functions:
EVAL_DATASET_SIZE A typical value is 5000. In this example, the value has been set to 500 for faster execution.
CALIBRATION_DATASET_SIZE A typical value is 2000. In this example, the value has been set to 200 for faster execution.
**_create_sampled_data_loader()** returns a DataLoader based on the dataset and the number of samples provided.
eval_callback() defines an evaluation function for the model.
[ ]:
import random
from typing import Optional
from tqdm import tqdm
import torch
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler, Subset
from aimet_torch.utils import in_eval_mode, get_device
EVAL_DATASET_SIZE = 500
CALIBRATION_DATASET_SIZE = 200
_datasets = {}
def _create_sampled_data_loader(dataset, num_samples):
if num_samples not in _datasets:
indices = random.sample(range(len(dataset)), num_samples)
_datasets[num_samples] = Subset(dataset, indices)
return DataLoader(_datasets[num_samples], batch_size=32)
def eval_callback(model: torch.nn.Module, num_samples: Optional[int] = None) -> float:
if num_samples is None:
num_samples = EVAL_DATASET_SIZE
data_loader = _create_sampled_data_loader(imagenet_dataset, num_samples)
device = get_device(model)
correct = 0
with in_eval_mode(model), torch.no_grad():
for image, label in tqdm(data_loader):
image = image.to(device)
label = label.to(device)
logits = model(image)
top1 = logits.topk(k=1).indices
correct += (top1 == label.view_as(top1)).sum()
return int(correct) / num_samples
2. Load a pretrained FP32 model¶
Load a pretrained resnet18 model from torchvision.
You can load any pretrained PyTorch model instead.
[ ]:
from torchvision.models import resnet18
model = resnet18(pretrained=True).eval()
if torch.cuda.is_available():
model.to(torch.device('cuda'))
accuracy = eval_callback(model)
print(f'- FP32 accuracy: {accuracy}')
3. Run AutoQuant¶
3.1 Create an AutoQuant object.
The AutoQuant feature uses an unlabeled dataset to quantize the model. The UnlabeledDatasetWrapper class creates an unlabeled Dataset object from a labeled Dataset.
[ ]:
from aimet_torch.v1.auto_quant import AutoQuant
class UnlabeledDatasetWrapper(Dataset):
def __init__(self, dataset):
self._dataset = dataset
def __len__(self):
return len(self._dataset)
def __getitem__(self, index):
images, _ = self._dataset[index]
return images
unlabeled_imagenet_dataset = UnlabeledDatasetWrapper(imagenet_dataset)
unlabeled_imagenet_data_loader = _create_sampled_data_loader(unlabeled_imagenet_dataset,
CALIBRATION_DATASET_SIZE)
dummy_input = torch.randn((1, 3, 224, 224)).to(get_device(model))
auto_quant = AutoQuant(model,
dummy_input=dummy_input,
data_loader=unlabeled_imagenet_data_loader,
eval_callback=eval_callback)
3.2 Run AutoQuant inference.
AutoQuant inference uses the eval_callback with the generic quantized model without applying PTQ techniques. This provides a baseline evaluation score before running AutoQuant optimization.
[ ]:
sim, initial_accuracy = auto_quant.run_inference()
print(f"- Quantized Accuracy (before optimization): {initial_accuracy}")
3.3 Set AdaRound Parameters (optional).
AutoQuant uses predefined default parameters for AdaRound. These values were determined empirically and work well with the common models.
If necessary, you can use custom parameters for Adaround. This example uses very small AdaRound parameters for faster execution.
[ ]:
from aimet_torch.v1.adaround.adaround_weight import AdaroundParameters
ADAROUND_DATASET_SIZE = 200
adaround_data_loader = _create_sampled_data_loader(unlabeled_imagenet_dataset, ADAROUND_DATASET_SIZE)
adaround_params = AdaroundParameters(adaround_data_loader, num_batches=len(adaround_data_loader), default_num_iterations=2000)
auto_quant.set_adaround_params(adaround_params)
3.4 Run AutoQuant Optimization.
This step runs AutoQuant optimization. AutoQuant returns the following:
The best possible quantized model
The corresponding evaluation score
The path to the encoding file
The allowed_accuracy_drop indicates the tolerable accuracy drop. AutoQuant applies a series of quantization features until the target accuracy (FP32 accuracy - allowed accuracy drop) is satisfied. When the target accuracy is reached, AutoQuant returns immediately without applying furhter PTQ techniques. See the AutoQuant User Guide and AutoQuant API documentation for details.
[ ]:
model, optimized_accuracy, encoding_path = auto_quant.optimize(allowed_accuracy_drop=0.01)
print(f"- Quantized Accuracy (after optimization): {optimized_accuracy}")
Next steps¶
The next step is to export this model for installation on the target.
Export the model and encodings.
Export the model with the updated weights but without the fake quant ops.
Export the encodings (scale and offset quantization parameters). AIMET QuantizationSimModel provides an export API for this purpose.
The following code performs these exports.
[ ]:
os.makedirs('./output/', exist_ok=True)
dummy_input = dummy_input.cpu()
sim.export(path='./output/', filename_prefix='resnet18_after_cle_bc', dummy_input=dummy_input)
For more information¶
See the AIMET API docs for details about the AIMET APIs and optional parameters.
See the other example notebooks to learn how to use other AIMET post-training quantization techniques.