aimet_tensorflow.mixed_precision¶
Top-level API for Regular AMP
- aimet_tensorflow.keras.mixed_precision.choose_mixed_precision(sim, candidates, eval_callback_for_phase1, eval_callback_for_phase2, allowed_accuracy_drop, results_dir, clean_start, forward_pass_callback, amp_search_algo=AMPSearchAlgo.Binary, phase1_optimize=True)[source]¶
High-level API to perform in place Mixed Precision evaluation on the given sim model. A pareto list is created and a curve for Accuracy vs BitOps is saved under the results directory
- Parameters:
sim (
QuantizationSimModel
) – Quantized sim modelinput_shape – tuple or list of tuples of input shape to the model
starting_op_names – List of starting op names of the model
output_op_names – List of output op names of the model
candidates (
List
[Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]]]) –List of tuples for all possible bitwidth values for activations and parameters Suppose the possible combinations are- ((Activation bitwidth - 8, Activation data type - int), (Parameter bitwidth - 16, parameter data type - int)) ((Activation bitwidth - 16, Activation data type - float), (Parameter bitwidth - 16, parameter data type - float)) candidates will be [((8, QuantizationDataType.int), (16, QuantizationDataType.int)),
((16, QuantizationDataType.float), (16, QuantizationDataType.float))]
eval_callback_for_phase1 (
CallbackFunc
) – An object of CallbackFunc class which takes in Eval function (callable) and eval function parameters. This evaluation callback used to measure sensitivity of each quantizer group during phase 1. The phase 1 involves finding accuracy list/sensitivity of each module. Therefore, a user might want to run the phase 1 with a smaller dataseteval_callback_for_phase2 (
CallbackFunc
) – An object of CallbackFunc class which takes in Eval function (callable) and eval function parameters. Evaluation callback used to get accuracy of quantized model for phase 2 calculations. The phase 2 involves finding pareto front curveallowed_accuracy_drop (
Optional
[float
]) – Maximum allowed drop in accuracy from FP32 baseline. The pareto front curve is plotted only till the point where the allowable accuracy drop is met. To get a complete plot for picking points on the curve, the user can set the allowable accuracy drop to None.results_dir (
str
) – Path to save results and cache intermediate resultsclean_start (
bool
) – If true, any cached information from previous runs will be deleted prior to starting the mixed-precision analysis. If false, prior cached information will be used if applicable. Note it is the user’s responsibility to set this flag to true if anything in the model or quantization parameters changes compared to the previous run.forward_pass_callback (
CallbackFunc
) – An object of CallbackFunc class which takes in Forward pass function (callable) and its function parameters. Forward pass callback used to compute quantization encodingsamp_search_algo (
AMPSearchAlgo
) – A valid value from the Enum AMPSearchAlgo. Defines the search algorithm to be used for the phase 2 of AMP. Default to BruteForce for regular AMP.phase1_optimize (
bool
) – If user set this parameter to false then phase1 default logic will be executed else optimized logic will be executed.
- Return type:
Optional
[List
[Tuple
[int
,float
,QuantizerGroup
,int
]]]- Returns:
Pareto front list containing a list of (Relative bit ops wrt baseline candidate, eval score, quantizer group and the candidate being used in each step). The Pareto front list can be used for plotting a pareto front curve which provides information regarding how bit ops vary w.r.t. accuracy. If the allowable accuracy drop is set to 100% then a user can use the pareto front curve to pick points and re-run, None if we early exit the mixed precision algorithm.
Top-level API for Fast AMP (AMP 2.0)
- aimet_tensorflow.keras.mixed_precision.choose_fast_mixed_precision(sim, candidates, data_loader_wrapper, eval_callback_for_phase2, allowed_accuracy_drop, results_dir, clean_start, forward_pass_callback, forward_pass_callback_2=None, amp_search_algo=AMPSearchAlgo.Binary, phase1_optimize=True)[source]¶
High-level API to perform in place Mixed Precision evaluation on the given sim model. A pareto list is created and a curve for Accuracy vs BitOps is saved under the results directory
- Parameters:
sim (
QuantizationSimModel
) – Quantized sim modelcandidates (
List
[Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]]]) –List of tuples for all possible bitwidth values for activations and parameters Suppose the possible combinations are- ((Activation bitwidth - 8, Activation data type - int), (Parameter bitwidth - 16, parameter data type - int)) ((Activation bitwidth - 16, Activation data type - float), (Parameter bitwidth - 16, parameter data type - float)) candidates will be [((8, QuantizationDataType.int), (16, QuantizationDataType.int)),
((16, QuantizationDataType.float), (16, QuantizationDataType.float))]
data_loader_wrapper (
Callable
) – A Callable function which when called should return a dataloader to be used to do phase 1 forward pass.eval_callback_for_phase2 (
CallbackFunc
) – An object of CallbackFunc class which takes in Eval function (callable) and eval function parameters. Evaluation callback used to get accuracy of quantized model for phase 2 calculations. The phase 2 involves finding pareto front curveallowed_accuracy_drop (
Optional
[float
]) – Maximum allowed drop in accuracy from FP32 baseline. The pareto front curve is plotted only till the point where the allowable accuracy drop is met. To get a complete plot for picking points on the curve, the user can set the allowable accuracy drop to None.results_dir (
str
) – Path to save results and cache intermediate resultsclean_start (
bool
) – If true, any cached information from previous runs will be deleted prior to starting the mixed-precision analysis. If false, prior cached information will be used if applicable. Note it is the user’s responsibility to set this flag to true if anything in the model or quantization parameters changes compared to the previous run.forward_pass_callback (
CallbackFunc
) – An object of CallbackFunc class which takes in Forward pass function (callable) and its function parameters. Forward pass callback used to compute quantization encodingsforward_pass_callback_2 (
Optional
[Callable
]) – forward pass callback function which will take an input model and inputs and perform forward pass on it and return the output nupy ndarray of the last layer. Can be kept None if the model works with the standard model.predict() forward passamp_search_algo (
AMPSearchAlgo
) – A valid value from the Enum AMPSearchAlgo. Defines the search algorithm to be used for the phase 2 of AMP. Default to Interpolation for fast AMP.phase1_optimize (
bool
) – If user set this parameter to false then phase1 default logic will be executed else optimized logic will be executed.
- Return type:
Optional
[List
[Tuple
[int
,float
,QuantizerGroup
,int
]]]- Returns:
Pareto front list containing a list of (Relative bit ops wrt baseline candidate, eval score, quantizer group and the candidate being used in each step). The Pareto front list can be used for plotting a pareto front curve which provides information regarding how bit ops vary w.r.t. accuracy. If the allowable accuracy drop is set to 100% then a user can use the pareto front curve to pick points and re-run, None if we early exit the mixed precision algorithm.
Note
To enable phase-3 set the attribute GreedyMixedPrecisionAlgo.ENABLE_CONVERT_OP_REDUCTION = True
Currently only two candidates are supported - ((8,int), (8,int)) & ((16,int), (8,int))
Quantizer Groups definition
- class aimet_tensorflow.keras.amp.quantizer_groups.QuantizerGroup(input_quantizers=<factory>, output_quantizers=<factory>, parameter_quantizers=<factory>)[source]¶
Group of modules and quantizers
- get_active_param_quantizers(name_to_quantizer_dict)[source]¶
Find all active param tensor quantizers associated with this quantizer group :type name_to_quantizer_dict:
Dict
:param name_to_quantizer_dict: Contains mapping of module name to sim.quantizer_config object- Return type:
List
[TensorQuantizer
]
- get_active_quantizers(name_to_quantizer_dict)[source]¶
Find all active tensor quantizers associated with this quantizer group
- Return type:
List
[TensorQuantizer
]
- get_candidate(name_to_quantizer_dict)[source]¶
Gets Activation & parameter bitwidth :type name_to_quantizer_dict:
Dict
:param name_to_quantizer_dict: Gets module from module name :rtype:Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]] :return: Tuple of Activation, parameter bitwidth and data type
- static lookup_quantizer(quantizer_name, name_to_quantizer_dict)[source]¶
Returns the quantizer layer corresponding to the name :quantizer_name: Name of the quantizer :name_to_quantizer_dict: Dictionary of mappings from quantizer name to quantizer layer
- Return type:
Layer
- set_quantizers_to_candidate(name_to_quantizer_dict, candidate)[source]¶
Sets a quantizer group to a given candidate bitwidth :type name_to_quantizer_dict:
Dict
:param name_to_quantizer_dict: Gets module from module name :type candidate:Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]] :param candidate: candidate with act and param bw and data types- Return type:
None
CallbackFunc Definition