Sequential MSE¶

Context¶

Sequential MSE (SeqMSE) is a quantization technique that optimizes the parameter encodings of each layer of a model individually to minimize the difference between the layer’s original and quantized outputs. Rather than relying on training, SeqMSE uses a search-based approach, offering several benefits:

It requires only a small amount of unlabeled data
It approximates the global minimum without getting trapped in local minima
It is robust to overfitting

API¶

PyTorch

Top level APIs

aimet_torch.seq_mse.apply_seq_mse(*args, **kwargs)[source]¶

Sequentially minimizing activation MSE loss in layer-wise way to decide optimal param quantization encodings.

1 Disable all input/output quantizers, param quantizers of non-supported modules 2 Find and feeze optimal parameter encodings candidate for remaining supported modules 3 Re-enable disabled quantizers from step 1

Example userflow: model = Model().eval() sim = QuantizationSimModel(…) apply_seq_mse(…) sim.compute_encodings(…) [compute encodings for all activations and parameters of non-supported modules] sim.export(…)

NOTE: modules in modules_to_exclude won’t be quantized and skipped when applying sequential MSE.

Parameters:

sim – QuantizationSimModel object
data_loader – Data loader
num_candidates – Number of candidate encodings to evaluate for each layer
forward_fn – callback function to perform forward pass given accepts model, inputs
modules_to_exclude – List of supported type module(s) to exclude when applying Sequential MSE
checkpoints_config – Config files to split fp32/quant model by checkpoints to speedup activations sampling

Sequential MSE parameters

class aimet_torch.seq_mse.SeqMseParams(num_batches, num_candidates=20, inp_symmetry='symqt', loss_fn='mse', forward_fn=<function default_forward_fn>)[source]¶

Sequential MSE parameters

Parameters:

num_batches (Optional[int]) – Number of batches.
num_candidates (int) – Number of candidates to perform grid search. Default 20.
inp_symmetry (str) – Input symmetry. Available options are ‘asym’, ‘symfp’ and ‘symqt’. Default ‘symqt’.
loss_fn (str) – Loss function. Available options are ‘mse’, ‘l1’ and ‘sqnr’. Default ‘mse’.
forward_fn (Callable) – Optional adapter function that performs forward pass given a model and inputs yielded from the data loader. The function expects model as first argument and inputs to model as second argument.

forward_fn(inputs)¶: Default forward function. :type model: :param model: pytorch model :type inputs: :param inputs: model inputs

get_loss_fn()[source]¶

Returns loss function

Return type:: Callable

ONNX

Top level APIs

aimet_onnx.apply_seq_mse(sim, inputs, num_candidates=20, nodes_to_exclude=None)[source]¶

Sequentially optimizes the QuantizationSimModel’s weight encodings to reduce MSE loss at layer outputs.

Parameters:

sim (QuantizationSimModel) – QuantizationSimModel instance to optimize
inputs (Collection[Dict[str, np.ndarray]]) – The set of input samples to use during optimization
num_candidates (int) – Number of encoding candidates to sweep for each weight. Decreasing this can reduce runtime but may lead to lower accuracy.
nodes_to_exclude (Optional[List[str]]) – List of supported node name(s) to exclude from sequential MSE optimization

Sequential MSE¶

Context¶

Workflow¶

Prerequisites¶

Procedure¶

Setup¶

Step 1¶

Step 2¶

Step 3¶

Step 4¶

Step 5¶

API¶