Sequential MSE¶

Context¶

Sequential MSE (SeqMSE) is a method that searches for optimal quantization encodings per operation (i.e. per layer) such that the difference between the original output activation and the corresponding quantization-aware output activation is minimized.

Since SeqMSE is search-based rather than learning-based, it has several advantages:

It requires only a small amount of calibration data
It approximates the global minimum without getting trapped in local minima
It is robust to overfitting

API¶

PyTorch

Top level APIs

aimet_torch.seq_mse.apply_seq_mse(model, sim, data_loader, params, modules_to_exclude=None, checkpoints_config=None)¶

Sequentially minimizing activation MSE loss in layer-wise way to decide optimal param quantization encodings.

1 Disable all input/output quantizers, param quantizers of non-supported modules 2 Find and feeze optimal parameter encodings candidate for remaining supported modules 3 Re-enable disabled quantizers from step 1

Example userflow: model = Model().eval() sim = QuantizationSimModel(…) apply_seq_mse(…) sim.compute_encodings(…) [compute encodings for all activations and parameters of non-supported modules] sim.export(…)

NOTE: 1) module reference passed to modules_to_exclude should be from FP32 model. 2) module from modules_to_exclude won’t be quantized and skipped when applying sequential MSE. 3) Except finding param encodings for supported modules, config JSON file will be respected and final state of sim will be unchanged.

Parameters:

model (Module) – Original fp32 model
sim (QuantizationSimModel) – Corresponding QuantizationSimModel object
data_loader (DataLoader) – Data loader
params (SeqMseParams) – Sequential MSE parameters
modules_to_exclude (Optional[List[Module]]) – List of supported type module(s) to exclude when applying Sequential MSE
checkpoints_config (Optional[str]) – Config files to split fp32/quant model by checkpoints to speedup activations sampling

Sequential MSE parameters

class aimet_torch.seq_mse.SeqMseParams(num_batches, num_candidates=20, inp_symmetry='symqt', loss_fn='mse', forward_fn=<function default_forward_fn>)[source]¶

Sequential MSE parameters

Parameters:

num_batches (int) – Number of batches.
num_candidates (int) – Number of candidates to perform grid search. Default 20.
inp_symmetry (str) – Input symmetry. Available options are ‘asym’, ‘symfp’ and ‘symqt’. Default ‘symqt’.
loss_fn (str) – Loss function. Available options are ‘mse’, ‘l1’ and ‘sqnr’. Default ‘mse’.
forward_fn (Callable) – Optional adapter function that performs forward pass given a model and inputs yielded from the data loader. The function expects model as first argument and inputs to model as second argument.

forward_fn(inputs)¶: Default forward function. :type model: :param model: pytorch model :type inputs: :param inputs: model inputs

get_loss_fn()[source]¶

Returns loss function

Return type:: Callable

TensorFlow

Not supported.

ONNX

Top level APIs

aimet_onnx.apply_seq_mse(sim, inputs, num_candidates=20)[source]¶

Sequentially optimizes the QuantizationSimModel’s weight encodings to reduce MSE loss at layer outputs.

Parameters:

sim (QuantizationSimModel) – Calibrated QuantizationSimModel instance to optimize
inputs (Collection[Dict[str, np.ndarray]]) – The set of input samples to use during optimization
num_candidates (int) – Number of encoding candidates to sweep for each weight. Decreasing this can reduce runtime but may lead to lower accuracy.

Sequential MSE¶

Context¶

Workflow¶

Prerequisites¶

Procedure¶

Setup¶

Step 1¶

Step 2¶

Step 3¶

Step 4¶

Step 5¶

API¶