aimet_torch.mixed_precision

Top-level API for Manual mixed precision

class aimet_torch.v2.mixed_precision.MixedPrecisionConfigurator(sim)[source]

Mixed Precision Configurator helps set up a mixed precision profile in the QuantSim object. The user is expected to follow the below steps to set the sim in Mixed Precision.

  1. Create QuantSim object

  2. Create the MixedPrecisionConfigurator object by passing in the QuantSim object

  3. Make a series of set_precision/set_model_input_precision/set_model_output_precision calls

  4. Call apply() method by passing in the config file and strict flag

  5. Run compute_encodings on the above QuantSim object

  6. Export the encodings/onnx artifacts

Parameters:

sim (QuantizationSimModel) – QuantSim object

set_precision(arg, activation, param=None)[source]
Parameters:
  • arg (Union[Module, Type[Module]]) – Module can be of type torch.nn.Module or the type of the module.

  • activation (Union[List[Literal['int16', 'int8', 'int4', 'fp16']], Literal['int16', 'int8', 'int4', 'fp16']]) – A string representing the activation dtype of the module input(s)

  • param (Optional[Dict[str, Literal['int16', 'int8', 'int4', 'fp16']]]) – Dict with name of the param as key and its dtype as value

  • If the ‘module’ is a leaf-module(the module doesnt compose of other torch.nn.module), the specified settings would be applied to the module.

  • If the ‘module’ is a non-leaf-module (module is composed of other torch.nn.module), the specified settings would be applied to all the leaf modules in ‘module’.

  • If the ‘module’ is Type of module, all the modules in the model which satisfy the specified module type would be set to the specified activation and param settings

  • If the same ‘module’ is specified through multiple set_precision(…) calls, the latest one will be applied.

Examples: TODO

set_model_input_precision(activation)[source]

Activation precision which needs to be set to the model inputs

Parameters:

activation (Union[List[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Tuple[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Literal['int16', 'int8', 'int4', 'fp16']]) – Activation dtypes for inputs of the model

set_model_output_precision(activation)[source]

Activation precision which needs to be set to the model outputs

Parameters:

activation (Union[List[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Tuple[Optional[Literal['int16', 'int8', 'int4', 'fp16']]], Literal['int16', 'int8', 'int4', 'fp16']]) – Activation dtypes for outputs of the model

apply(log_file='./mmp_log.txt', strict=True)[source]

Apply the mp settings specified through the set_precision/set_model_input_precision/set_model_output_precision calls to the QuantSim object

Parameters:
  • log_file (Union[IO, str, None]) – log_file to store the logs. log_file can either be a string representing the path or the IO object to write the logs into.

  • strict (bool) – Boolean flag to indicate whether to fail (strict=True) on incorrect/conflicting inputs made by the user or (strict=False) take a best-effort approach to realize the MP settings

Top-level API for Automatic mixed precision

aimet_torch.mixed_precision.choose_mixed_precision(sim, *args, **kwargs)[source]

Note

To enable phase-3 set the attribute GreedyMixedPrecisionAlgo.ENABLE_CONVERT_OP_REDUCTION = True

Currently only two candidates are supported - ((8,int), (8,int)) & ((16,int), (8,int))

Quantizer Groups definition

class aimet_torch.amp.quantizer_groups.QuantizerGroup(input_quantizers=<factory>, output_quantizers=<factory>, parameter_quantizers=<factory>, supported_kernel_ops=<factory>)[source]

Group of modules and quantizers

get_active_quantizers(name_to_quantizer_dict)[source]

Find all active tensor quantizers associated with this quantizer group

get_candidate(name_to_quantizer_dict)[source]

Gets Activation & parameter bitwidth :type name_to_quantizer_dict: Dict :param name_to_quantizer_dict: Gets module from module name :rtype: Tuple[Tuple[int, QuantizationDataType], Tuple[int, QuantizationDataType]] :return: Tuple of Activation, parameter bitwidth and data type

get_input_quantizer_modules()[source]

helper method to get the module names corresponding to input_quantizers

set_quantizers_to_candidate(name_to_quantizer_dict, candidate)[source]

Sets a quantizer group to a given candidate bitwidth :type name_to_quantizer_dict: Dict :param name_to_quantizer_dict: Gets module from module name :type candidate: Tuple[Tuple[int, QuantizationDataType], Tuple[int, QuantizationDataType]] :param candidate: candidate with act and param bw and data types

Return type:

None

to_list()[source]

Converts quantizer group to a list :rtype: List[Tuple[str, str]] :return: List containing input/output quantizers & weight quantizers

CallbackFunc Definition

class aimet_common.defs.CallbackFunc(func, func_callback_args=None)[source]

Class encapsulating call back function and it’s arguments

Parameters:
  • func (Callable) – Callable Function

  • func_callback_args – Arguments passed to the callable function

class aimet_torch.amp.mixed_precision_algo.EvalCallbackFactory(data_loader, forward_fn=None)[source]

Factory class for various built-in eval callbacks

Parameters:
  • data_loader (DataLoader) – Data loader to be used for evaluation

  • forward_fn (Optional[Callable[[Module, Any], Tensor]]) – Function that runs forward pass and returns the output tensor. This function is expected to take 1) a model and 2) a single batch yielded from the data loader, and return a single torch.Tensor object which represents the output of the model. The default forward function is roughly equivalent to lambda model, batch: model(batch)

sqnr(num_samples=128)[source]

Returns SQNR eval callback.

Parameters:

num_samples (int) – Number of samples used for evaluation

Return type:

CallbackFunc

Returns:

A callback function that evaluates the input model’s SQNR between fp32 outputs and fake-quantized outputs