AdaScale¶

Context¶

AdaScale is a PTQ technique which improves the accuracy of the quantized model by computing optimal quantization parameters for weights. AdaScale is based on FlexRound: https://arxiv.org/abs/2306.00317 and integrates Learnable Weight Clipping from OmniQuant: https://arxiv.org/abs/2308.13137.

AdaScale introduces trainable parameters (gamma, beta, s2, s3) in the weight quantizers of every supported module and performs BKD (Blockwise Knowledge Distillation) by comparing quantized output of every supported block with its FP32 equivalent.

From AdaScale perspective, a block is defined as a non-leaf module which takes in one activation input tensor and outputs one activation tensor.

Warning: This feature is currently experimental.

API¶

PyTorch

Top level APIs

aimet_torch.experimental.adascale.apply_adascale(qsim, data_loader, forward_fn=None, num_epochs=1)¶

Parameters:

qsim (QuantizationSimModel) – Quantization Sim model
data_loader (DataLoader) – DataLoader object to load the input data
forward_fn (Optional[Callable[[Module, Any], Any]]) – forward function to run the forward pass of the model
num_epochs (int) – Number of epochs to perform the AdaScale BKD

Note that the forward_fn should take exactly two arguments - 1) the model 2) The object returned from the dataloader irrespective of whether it’s a tensor/tuple of tensors/dict/etc

The forward_fn should prepare the “input sample” as needed and call the forward pass in the very end. The forward_fn should not be running any sort of eval, creating full dataloader inside the method, etc.

Example usage:

>>> model = DummyModel()
>>> dummy_input = ...
>>> data_set = DataSet(dummy_input)
>>> data_loader = DataLoader(data_set, ...)
>>> sim = QuantizationSimModel(model, dummy_input)
>>> apply_adascale(sim, data_loader, forward_fn=forward_fn, num_epochs=10)
>>> sim.compute_encodings(...)
>>> sim.export(...)

apply_adascale modifies the weights in-place in the model
compute encodings should not be called before the apply_adascale call
Activation quantizers will remain uninitialized throughout the feature, and so compute encodings needs to be called by the user afterwards. This is so activation encodings will be computed with updated weights taken into account.

Warning: This feature is currently considered experimental pending API changes

TensorFlow

Not supported.

ONNX

Not supported.

AdaScale¶

Context¶

Workflow¶

Prerequisites¶

Procedure¶

Setup¶

Step 1¶

Step 2¶

Step 3¶

Step 4¶

Step 5¶

API¶