aimet_torch.auto_quant¶

Top-level API

class aimet_torch.auto_quant.AutoQuantWithAutoMixedPrecision(model, dummy_input, data_loader, eval_callback, param_bw=8, output_bw=8, quant_scheme=QuantScheme.post_training_tf_enhanced, rounding_mode='nearest', config_file=None, results_dir='/tmp', cache_id=None, strict_validation=True, model_prepare_required=True)[source]

Integrate and apply post-training quantization techniques.

AutoQuant includes 1) batchnorm folding, 2) cross-layer equalization, 3) Adaround, and 4) Automatic Mixed Precision (if enabled). These techniques will be applied in a best-effort manner until the model meets the evaluation goal given as allowed_accuracy_drop.

Parameters:

model (Module) – Model to be quantized. Assumes model is on the correct device
dummy_input (Union[Tensor, Tuple]) – Dummy input for the model. Assumes that dummy_input is on the correct device
data_loader (DataLoader) – A collection that iterates over an unlabeled dataset, used for computing encodings
eval_callback (Callable[[Module], float]) – Function that calculates the evaluation score
param_bw (int) – Parameter bitwidth
output_bw (int) – Output bitwidth
quant_scheme (QuantScheme) – Quantization scheme
rounding_mode (str) – Rounding mode
config_file (Optional[str]) – Path to configuration file for model quantizers
results_dir (str) – Directory to save the results of PTQ techniques
cache_id (Optional[str]) – ID associated with cache results
strict_validation (bool) – Flag set to True by default.hen False, AutoQuant will proceed with execution and handle errors internally if possible. This may produce unideal or unintuitive results.
model_prepare_required (bool) – Flag set to True by default.If False, AutoQuant will skip model prepare block in the pipeline.

run_inference()[source]

Creates a quantization model and performs inference

Return type:: Tuple[QuantizationSimModel, float]
Returns:: QuantizationSimModel, model accuracy as float

optimize(allowed_accuracy_drop=0.0)[source]

Integrate and apply post-training quantization techniques.

Parameters:: allowed_accuracy_drop (float) – Maximum allowed accuracy drop
Return type:: Tuple[Module, float, str, List[Tuple[int, float, QuantizerGroup, Tuple]]]
Returns:: Tuple of (best model, eval score, encoding path, pareto front). Pareto front is None if AMP is not enabled or AutoQuant exits without performing AMP.

set_adaround_params(adaround_params)[source]

Set Adaround parameters. If this method is not called explicitly by the user, AutoQuant will use data_loader (passed to __init__) for Adaround.

Parameters:: adaround_params (AdaroundParameters) – Adaround parameters.
Return type:: None

set_export_params(onnx_export_args=-1, propagate_encodings=None)[source]

Set parameters for QuantizationSimModel.export.

Parameters:

onnx_export_args (OnnxExportApiArgs) – optional export argument with onnx specific overrides if not provide export via torchscript graph
propagate_encodings (Optional[bool]) – If True, encoding entries for intermediate ops (when one PyTorch ops results in multiple ONNX nodes) are filled with the same BW and data_type as the output tensor for that series of ops.

Return type:

None

set_mixed_precision_params(candidates, num_samples_for_phase_1=128, forward_fn=<function _default_forward_fn>, num_samples_for_phase_2=None)[source]

Set mixed precision parameters. NOTE: Automatic mixed precision will NOT be enabled unless this method is explicitly called by the user.

Parameters:

candidates (List[Tuple[Tuple[int, QuantizationDataType], Tuple[int, QuantizationDataType]]]) – List of tuples of candidate bitwidths and datatypes.
num_samples_for_phase_1 (Optional[int]) – Number of samples to be used for performance evaluation in AMP phase 1.
forward_fn (Callable) – Function that runs forward pass and returns the output tensor. which will be used for SQNR compuatation in phase 1. This function is expected to take 1) a model and 2) a single batch yielded from the data loader, and return a single torch.Tensor object which represents the output of the model. The default forward function is roughly equivalent to lambda model, batch: model(batch)
num_samples_for_phase_2 (Optional[int]) – Number of samples to be used for performance evaluation in AMP phase 2.

Return type:

None

set_model_preparer_params(modules_to_exclude=None, concrete_args=None)[source]

Set parameters for model preparer.

Parameters:

modules_to_exclude (Optional[List[Module]]) – List of modules to exclude when tracing.
concrete_args (Optional[Dict[str, Any]]) – Parameter for model preparer. Allows you to partially specialize your function, whether it’s to remove control flow or data structures. If the model has control flow, torch.fx won’t be able to trace the model. Check torch.fx.symbolic_trace API in detail.

get_quant_scheme_candidates()[source]

Return the candidates for quant scheme search. During optimize(), the candidate with the highest accuracy will be selected among them.

Return type:: Tuple[_QuantSchemePair, ...]
Returns:: Candidates for quant scheme search

set_quant_scheme_candidates(candidates)[source]

Set candidates for quant scheme search. During optimize(), the candidate with the highest accuracy will be selected among them.

Parameters:: candidates (Tuple[_QuantSchemePair, ...]) – Candidates for quant scheme search