Manual mixed precision¶

Context¶

To effectively use mixed precision, you must find the correct quantizers to run at higher precision settings. This requires complex, error-prone graph traversals. The AIMET manual mixed precision (MMP) configurator hides this issue by providing easy-to-use APIs to configure the model in mixed precision. You can change the precision of a layer by directly specifying the layer and the intended precision. MMP configurator also analyzes and reports how the mixed precision was achieved.

MMP configurator enables you to change the precision of the following within a model:

A leaf layer
A non-leaf layer (a layer composed of multiple leaf layers)
All layers of a certain type
Model input tensors or a subset of input tensors
Model output tensors or a subset of output tensors

API¶

PyTorch

Top-level API for Manual mixed precision

class aimet_torch.v2.mixed_precision.MixedPrecisionConfigurator(sim)[source]¶

Mixed Precision Configurator helps set up a mixed precision profile in the QuantSim object. The user is expected to follow the below steps to set the sim in Mixed Precision.

Create QuantSim object
Create the MixedPrecisionConfigurator object by passing in the QuantSim object
Make a series of set_precision/set_model_input_precision/set_model_output_precision calls
Call apply() method by passing in the config file and strict flag
Run compute_encodings on the above QuantSim object
Export the encodings/onnx artifacts

Parameters:: sim (QuantizationSimModel) – QuantSim object

set_precision(arg, activation, param=None)[source]¶

Parameters:

arg (Union[Module, Type[Module]]) – Module can be of type torch.nn.Module or the type of the module.
activation (Union[List[Literal['int16', 'int8', 'int4', 'fp16']], Literal['int16', 'int8', 'int4', 'fp16']]) – A string representing the activation dtype of the module input(s)
param (Optional[Dict[str, Literal['int16', 'int8', 'int4', 'fp16']]]) – Dict with name of the param as key and its dtype as value

If the ‘module’ is a leaf-module(the module doesnt compose of other torch.nn.module), the specified settings would be applied to the module.
If the ‘module’ is a non-leaf-module (module is composed of other torch.nn.module), the specified settings would be applied to all the leaf modules in ‘module’.
If the ‘module’ is Type of module, all the modules in the model which satisfy the specified module type would be set to the specified activation and param settings
If the same ‘module’ is specified through multiple set_precision(…) calls, the latest one will be applied.

Examples: TODO

set_model_input_precision(activation)[source]¶

Activation precision which needs to be set to the model inputs

Parameters:: activation (Union[List[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Tuple[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Literal['int16', 'int8', 'int4', 'fp16']]) – Activation dtypes for inputs of the model

set_model_output_precision(activation)[source]¶

Activation precision which needs to be set to the model outputs

Parameters:: activation (Union[List[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Tuple[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Literal['int16', 'int8', 'int4', 'fp16']]) – Activation dtypes for outputs of the model

apply(log_file='./mmp_log.txt', strict=True)[source]¶

Apply the mp settings specified through the set_precision/set_model_input_precision/set_model_output_precision calls to the QuantSim object

Parameters:

log_file (Union[IO, str, None]) – log_file to store the logs. log_file can either be a string representing the path or the IO object to write the logs into.
strict (bool) – Boolean flag to indicate whether to fail (strict=True) on incorrect/conflicting inputs made by the user or (strict=False) take a best-effort approach to realize the MP settings

TensorFlow

Not supported.

ONNX

Not supported.

Manual mixed precision¶

Context¶

Workflow¶

Prerequisites¶

Setup¶

Step 1: Applying MMP API options¶

Set precision of a leaf layer¶

Set precision of a non-leaf layer¶

Set precision based on layer type¶

Set model input precision¶

Set model output precision¶

Step 2: Applying the profile¶

API¶