aimet_onnx.mixed_precision¶
Top-level API
- aimet_onnx.mixed_precision.choose_mixed_precision(sim, candidates, eval_callback_for_phase1, eval_callback_for_phase2, allowed_accuracy_drop, results_dir, clean_start, forward_pass_callback, use_all_amp_candidates=False, phase1_optimize=True, amp_search_algo=AMPSearchAlgo.Binary)[source]¶
High-level API to perform in place Mixed Precision evaluation on the given sim model. A pareto list is created and a curve for Accuracy vs BitOps is saved under the results directory
- Parameters:
sim (
QuantizationSimModel
) – Quantized sim modelcandidates (
List
[Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]]]) –List of tuples for all possible bitwidth values for activations and parameters Suppose the possible combinations are- ((Activation bitwidth - 8, Activation data type - int), (Parameter bitwidth - 16, parameter data type - int)) ((Activation bitwidth - 16, Activation data type - float), (Parameter bitwidth - 16, parameter data type - float)) candidates will be [((8, QuantizationDataType.int), (16, QuantizationDataType.int)),
((16, QuantizationDataType.float), (16, QuantizationDataType.float))]
eval_callback_for_phase1 (
CallbackFunc
) – An object of CallbackFunc class which takes in Eval function (callable) and eval function parameters. This evaluation callback used to measure sensitivity of each quantizer group during phase 1. The phase 1 involves finding accuracy list/sensitivity of each module. Therefore, a user might want to run the phase 1 with a smaller dataseteval_callback_for_phase2 (
CallbackFunc
) – An object of CallbackFunc class which takes in Eval function (callable) and eval function parameters. Evaluation callback used to get accuracy of quantized model for phase 2 calculations. The phase 2 involves finding pareto front curveallowed_accuracy_drop (
Optional
[float
]) – Maximum allowed drop in accuracy from FP32 baseline. The pareto front curve is plotted only till the point where the allowable accuracy drop is met. To get a complete plot for picking points on the curve, the user can set the allowable accuracy drop to None.results_dir (
str
) – Path to save results and cache intermediate resultsclean_start (
bool
) – If true, any cached information from previous runs will be deleted prior to starting the mixed-precision analysis. If false, prior cached information will be used if applicable. Note it is the user’s responsibility to set this flag to true if anything in the model or quantization parameters changes compared to the previous run.forward_pass_callback (
CallbackFunc
) – An object of CallbackFunc class which takes in Forward pass function (callable) and its function parameters. Forward pass callback used to compute quantization encodingsuse_all_amp_candidates (
bool
) – Using the “supported_kernels” field in the config file (under defaults and op_type sections), a list of supported candidates can be specified. All the AMP candidates which are passed through the “candidates” field may not be supported based on the data passed through “supported_kernels”. When the field “use_all_amp_candidates” is set to True, the AMP algorithm will ignore the “supported_kernels” in the config file and continue to use all candidates.amp_search_algo (
AMPSearchAlgo
) – A valid value from the Enum AMPSearchAlgo. Defines the search algorithm to be used for the phase 2 of AMP.
- Phase1_optimize:
If user set this parameter to false then phase1 default logic will be executed else optimized logic will be executed.
- Return type:
Optional
[List
[Tuple
[int
,float
,QuantizerGroup
,int
]]]- Returns:
Pareto front list containing information including Bitops, QuantizerGroup candidates and corresponding eval scores. The Pareto front list can be used for plotting a pareto front curve which provides information regarding how bit ops vary w.r.t. accuracy. If the allowable accuracy drop is set to 100% then a user can use the pareto front curve to pick points and re-run, None if we early exit the mixed precision algorithm.
Note
It is recommended to use onnx-simplifier before applying mixed-precision.
Quantizer Groups definition
- class aimet_onnx.amp.quantizer_groups.QuantizerGroup(parameter_quantizers=<factory>, activation_quantizers=<factory>)[source]¶
Group of modules and quantizers
- get_activation_quantizers(name_to_quantizer_dict)[source]¶
Gets activation quantizers
- Parameters:
name_to_quantizer_dict – Gets module from module name
:return List of activation quantizers
- get_active_quantizers(name_to_quantizer_dict)[source]¶
Find all active tensor quantizers associated with this quantizer group
- Parameters:
name_to_quantizer_dict – Gets module from module name
- Return type:
List
[QcQuantizeOp
]- Returns:
List of active quantizers
- get_candidate(name_to_quantizer_dict)[source]¶
Gets Activation & parameter bitwidth
- Parameters:
name_to_quantizer_dict (
Dict
) – Gets module from module name- Return type:
Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]]- Returns:
Tuple of Activation, parameter bitwidth and data type
- get_param_quantizers(name_to_quantizer_dict)[source]¶
Gets parameter quantizers
- Parameters:
name_to_quantizer_dict – Gets module from module name
:return List of parameter quantizers
- set_quantizers_to_candidate(name_to_quantizer_dict, candidate)[source]¶
Sets a quantizer group to a given candidate bitwidth
- Parameters:
name_to_quantizer_dict (
Dict
) – Gets module from module namecandidate (
Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]]) – candidate with act and param bw and data types
CallbackFunc Definition
- class aimet_common.defs.CallbackFunc(func, func_callback_args=None)[source]¶
Class encapsulating call back function and it’s arguments
- Parameters:
func (
Callable
) – Callable Functionfunc_callback_args – Arguments passed to the callable function
- class aimet_onnx.amp.mixed_precision_algo.EvalCallbackFactory(data_loader, forward_fn=None)[source]¶
Factory class for various built-in eval callbacks
- Parameters:
data_loader (
DataLoader
) – Data loader to be used for evaluationforward_fn (
Optional
[Callable
]) – Function that runs forward pass and returns the output tensor. This function is expected to take 1) a model 2) List of starting op names 3) List of output op names and 4) batch yielded from the data set, and return a single tf.Tensor (or np.ndarray) object which represents the output of the model.
- sqnr(sim, num_samples=128)[source]¶
Returns SQNR eval callback. NOTE: sim object is required to enable/disable quantizer_info objects associated with quant ops.
- Parameters:
sim (
QuantizationSimModel
) – Quantized sim modelnum_samples (
int
) – Number of samples used for evaluation
- Return type:
- Returns:
A callback function that evaluates model SQNR between fp32_outputs and quantized outputs.