AIMET PyTorch AutoQuant API¶
Examples Notebook Link¶
For an end-to-end notebook showing how to use PyTorch AutoQuant, please see here.
Top-level API¶
-
class
aimet_torch.auto_quant_v2.
AutoQuant
(model, dummy_input, data_loader, eval_callback, param_bw=8, output_bw=8, quant_scheme=<QuantScheme.post_training_tf_enhanced: 2>, rounding_mode='nearest', config_file=None, results_dir='/tmp', cache_id=None, strict_validation=True)[source]¶ Integrate and apply post-training quantization techniques.
AutoQuant includes 1) batchnorm folding, 2) cross-layer equalization, and 3) Adaround. These techniques will be applied in a best-effort manner until the model meets the evaluation goal given as allowed_accuracy_drop.
- Parameters
model (
Module
) – Model to be quantized. Assumes model is on the correct devicedummy_input (
Union
[Tensor
,Tuple
]) – Dummy input for the model. Assumes that dummy_input is on the correct devicedata_loader (
DataLoader
[+T_co]) – A collection that iterates over an unlabeled dataset, used for computing encodingseval_callback (
Callable
[[Module
],float
]) – Function that calculates the evaluation scoreparam_bw (
int
) – Parameter bitwidthoutput_bw (
int
) – Output bitwidthquant_scheme (
QuantScheme
) – Quantization schemerounding_mode (
str
) – Rounding modeconfig_file (
Optional
[str
]) – Path to configuration file for model quantizersresults_dir (
str
) – Directory to save the results of PTQ techniquescache_id (
Optional
[str
]) – ID associated with cache resultsstrict_validation (
bool
) – Flag set to True by default.hen False, AutoQuant will proceed with execution and handle errors internally if possible. This may produce unideal or unintuitive results.
-
run_inference
()[source]¶ Creates a quantization model and performs inference
- Return type
Tuple
[QuantizationSimModel
,float
]- Returns
QuantizationSimModel, model accuracy as float
-
optimize
(allowed_accuracy_drop=0.0)[source]¶ Integrate and apply post-training quantization techniques.
- Parameters
allowed_accuracy_drop (
float
) – Maximum allowed accuracy drop- Return type
Tuple
[Module
,float
,str
]- Returns
Tuple of (best model, eval score, encoding path)
-
set_adaround_params
(adaround_params)[source]¶ Set Adaround parameters. If this method is not called explicitly by the user, AutoQuant will use data_loader (passed to __init__) for Adaround.
- Parameters
adaround_params (
AdaroundParameters
) – Adaround parameters.- Return type
None
-
set_export_params
(onnx_export_args=-1, propagate_encodings=None)[source]¶ Set parameters for QuantizationSimModel.export.
- Parameters
onnx_export_args (
OnnxExportApiArgs
) – optional export argument with onnx specific overrides if not provide export via torchscript graphpropagate_encodings (
Optional
[bool
]) – If True, encoding entries for intermediate ops (when one PyTorch ops results in multiple ONNX nodes) are filled with the same BW and data_type as the output tensor for that series of ops.
- Return type
None
-
set_model_preparer_params
(modules_to_exclude=None, concrete_args=None)[source]¶ Set parameters for model preparer.
- Parameters
modules_to_exclude (
Optional
[List
[Module
]]) – List of modules to exclude when tracing.concrete_args (
Optional
[Dict
[str
,Any
]]) – Parameter for model preparer. Allows you to partially specialize your function, whether it’s to remove control flow or data structures. If the model has control flow, torch.fx won’t be able to trace the model. Check torch.fx.symbolic_trace API in detail.
-
get_quant_scheme_candidates
()[source]¶ Return the candidates for quant scheme search. During
optimize()
, the candidate with the highest accuracy will be selected among them.- Return type
Tuple
[_QuantSchemePair
, …]- Returns
Candidates for quant scheme search
-
set_quant_scheme_candidates
(candidates)[source]¶ Set candidates for quant scheme search. During
optimize()
, the candidate with the highest accuracy will be selected among them.- Parameters
candidates (
Tuple
[_QuantSchemePair
, …]) – Candidates for quant scheme search
-
class
aimet_torch.auto_quant.
AutoQuant
(allowed_accuracy_drop, unlabeled_dataset_iterable, eval_callback, default_param_bw=8, default_output_bw=8, default_quant_scheme=<QuantScheme.post_training_tf_enhanced: 2>, default_rounding_mode='nearest', default_config_file=None)[source]¶ Warning
auto_quant.AutoQuant
is deprecated and will be replaced withauto_quant_v2.AutoQuant
in the later versions.Integrate and apply post-training quantization techniques.
AutoQuant includes 1) batchnorm folding, 2) cross-layer equalization, and 3) Adaround. These techniques will be applied in a best-effort manner until the model meets the evaluation goal given as allowed_accuracy_drop.
- Parameters
allowed_accuracy_drop (
float
) – Maximum allowed accuracy drop.unlabeled_dataset_iterable (
Union
[DataLoader
[+T_co],Collection
[+T_co]]) – A collection (i.e. iterable with __len__) that iterates over an unlabeled dataset used for encoding computation. The values yielded by this iterable are expected to be able to be passed directly to the model. By default, this iterable will be also used for Adaround unless otherwise specified by self.set_adaround_params.eval_callback (
Callable
[[Module
,Optional
[int
]],float
]) – A function that maps model and the number samples to the evaluation score. This callback is expected to return a scalar value representing the model performance evaluated against exactly N samples, where N is the number of samples passed as the second argument of this callback. NOTE: If N is None, the model is expected to be evaluated against the whole evaluation dataset.default_param_bw (
int
) – Default bitwidth (4-31) to use for quantizing layer parameters.default_output_bw (
int
) – Default bitwidth (4-31) to use for quantizing layer inputs andoutputs.default_quant_scheme (
QuantScheme
) – Quantization scheme. Supported values are QuantScheme.post_training_tf or QuantScheme.post_training_tf_enhanced.default_rounding_mode (
str
) – Rounding mode. Supported options are ‘nearest’ or ‘stochastic’default_config_file (
Optional
[str
]) – Path to configuration file for model quantizers
-
apply
(fp32_model, dummy_input_on_cpu, dummy_input_on_gpu=None, results_dir='/tmp', cache_id=None)[source]¶ Apply post-training quantization techniques.
- Parameters
fp32_model (
Module
) – Model to apply PTQ techniques.dummy_input_on_cpu (
Union
[Tensor
,Tuple
]) – Dummy input to the model in CPU memory.dummy_input_on_gpu (
Union
[Tensor
,Tuple
,None
]) – Dummy input to the model in GPU memory. This parameter is required if and only if the fp32_model is on GPU.results_dir (
str
) – Directory to save the results.cache_id (
Optional
[str
]) – A string that composes a cache id in combination with results_dir. If specified, AutoQuant will load/save the PTQ results from/to the file system if previous PTQ results produced under the same results_dir and cache_id exist,
- Return type
Tuple
[Module
,float
,str
]- Returns
Tuple of (best model, eval score, encoding path front).
- Raises
ValueError if the model is on GPU and dummy_input_on_gpu is not specified.
-
set_adaround_params
(adaround_params)[source]¶ Set Adaround parameters. If this method is not called explicitly by the user, AutoQuant will use unlabeled_dataset_iterable (passed to __init__) for Adaround.
- Parameters
adaround_params (
AdaroundParameters
) – Adaround parameters.- Return type
None
-
set_export_params
(onnx_export_args=-1, propagate_encodings=None)[source]¶ Set parameters for QuantizationSimModel.export.
- Parameters
onnx_export_args (
OnnxExportApiArgs
) – optional export argument with onnx specific overrides if not provide export via torchscript graphpropagate_encodings (
Optional
[bool
]) – If True, encoding entries for intermediate ops (when one PyTorch ops results in multiple ONNX nodes) are filled with the same BW and data_type as the output tensor for that series of ops.
- Return type
None
Code Examples¶
import random
from typing import Optional
import torch
from torch.utils.data import Dataset, DataLoader, SubsetRandomSampler
from torchvision import models, datasets, transforms
from aimet_torch.adaround.adaround_weight import AdaroundParameters
from aimet_torch.auto_quant_v2 import AutoQuant
# Step 1. Define constants and helper functions
EVAL_DATASET_SIZE = 5000
CALIBRATION_DATASET_SIZE = 2000
BATCH_SIZE = 100
_subset_samplers = {}
def _create_sampled_data_loader(dataset, num_samples):
if num_samples not in _subset_samplers:
indices = random.sample(range(len(dataset)), num_samples)
_subset_samplers[num_samples] = SubsetRandomSampler(indices=indices)
return DataLoader(dataset,
sampler=_subset_samplers[num_samples],
batch_size=BATCH_SIZE)
# Step 2. Prepare model and dataset
fp32_model = models.resnet18(pretrained=True).eval()
input_shape = (1, 3, 224, 224)
dummy_input = torch.randn(input_shape)
transform = transforms.Compose((
transforms.ToTensor(),
))
# NOTE: In the actual use cases, a real dataset should provide by the users.
eval_dataset = datasets.FakeData(size=EVAL_DATASET_SIZE,
image_size=input_shape[1:],
num_classes=1000,
transform=transform)
# Step 3. Prepare unlabeled dataset
# NOTE: In the actual use cases, the users should implement this part to serve
# their own goals if necessary.
class UnlabeledDatasetWrapper(Dataset):
def __init__(self, dataset):
self._dataset = dataset
def __len__(self):
return len(self._dataset)
def __getitem__(self, index):
images, _ = self._dataset[index]
return images
unlabeled_dataset = UnlabeledDatasetWrapper(eval_dataset)
unlabeled_data_loader = _create_sampled_data_loader(unlabeled_dataset, CALIBRATION_DATASET_SIZE)
# Step 4. Prepare eval callback
# NOTE: In the actual use cases, the users should implement this part to serve
# their own goals if necessary.
def eval_callback(model: torch.nn.Module, num_samples: Optional[int] = None) -> float:
if num_samples is None:
num_samples = len(eval_dataset)
eval_data_loader = _create_sampled_data_loader(eval_dataset, num_samples)
num_correct_predictions = 0
for images, labels in eval_data_loader:
predictions = torch.argmax(model(images.cuda()), dim=1)
num_correct_predictions += torch.sum(predictions.cpu() == labels)
return int(num_correct_predictions) / num_samples
# Step 5. Create AutoQuant object
auto_quant = AutoQuant(fp32_model.cuda(),
dummy_input.cuda(),
unlabeled_data_loader,
eval_callback)
# Step 6. (Optional) Set adaround params
ADAROUND_DATASET_SIZE = 2000
adaround_data_loader = _create_sampled_data_loader(unlabeled_dataset, ADAROUND_DATASET_SIZE)
adaround_params = AdaroundParameters(adaround_data_loader, num_batches=len(adaround_data_loader))
auto_quant.set_adaround_params(adaround_params)
# Step 7. Run AutoQuant
sim, initial_accuracy = auto_quant.run_inference()
model, optimized_accuracy, encoding_path = auto_quant.optimize(allowed_accuracy_drop=0.01)
print(f"- Quantized Accuracy (before optimization): {initial_accuracy:.4f}")
print(f"- Quantized Accuracy (after optimization): {optimized_accuracy:.4f}")
Note
To use auto_quant.AutoQuant
(will be deprecated), apply the following code changes to step 5 and 7.
# Step 5. Create AutoQuant object
auto_quant = AutoQuant(allowed_accuracy_drop=0.01,
unlabeled_dataset_iterable=unlabeled_data_loader,
eval_callback=eval_callback)
# Step 6. (Optional) Set adaround params
ADAROUND_DATASET_SIZE = 2000
adaround_data_loader = _create_sampled_data_loader(unlabeled_dataset, ADAROUND_DATASET_SIZE)
adaround_params = AdaroundParameters(adaround_data_loader, num_batches=len(adaround_data_loader))
auto_quant.set_adaround_params(adaround_params)
# Step 7. Run AutoQuant
model, accuracy, encoding_path =\
auto_quant.apply(fp32_model.cuda(),
dummy_input_on_cpu=dummy_input.cpu(),
dummy_input_on_gpu=dummy_input.cuda())
print(f"- Quantized Accuracy (after optimization): {optimized_accuracy:.4f}")