aimet_torch.auto_quant¶
Top-level API
- class aimet_torch.auto_quant.AutoQuantWithAutoMixedPrecision(model, dummy_input, data_loader, eval_callback, param_bw=8, output_bw=8, quant_scheme=QuantScheme.post_training_tf_enhanced, rounding_mode='nearest', config_file=None, results_dir='/tmp', cache_id=None, strict_validation=True, model_prepare_required=True)[source]
Integrate and apply post-training quantization techniques.
AutoQuant includes 1) batchnorm folding, 2) cross-layer equalization, 3) Adaround, and 4) Automatic Mixed Precision (if enabled). These techniques will be applied in a best-effort manner until the model meets the evaluation goal given as allowed_accuracy_drop.
- Parameters:
model (
Module
) – Model to be quantized. Assumes model is on the correct devicedummy_input (
Union
[Tensor
,Tuple
]) – Dummy input for the model. Assumes that dummy_input is on the correct devicedata_loader (
DataLoader
) – A collection that iterates over an unlabeled dataset, used for computing encodingseval_callback (
Callable
[[Module
],float
]) – Function that calculates the evaluation scoreparam_bw (
int
) – Parameter bitwidthoutput_bw (
int
) – Output bitwidthquant_scheme (
QuantScheme
) – Quantization schemerounding_mode (
str
) – Rounding modeconfig_file (
Optional
[str
]) – Path to configuration file for model quantizersresults_dir (
str
) – Directory to save the results of PTQ techniquescache_id (
Optional
[str
]) – ID associated with cache resultsstrict_validation (
bool
) – Flag set to True by default.hen False, AutoQuant will proceed with execution and handle errors internally if possible. This may produce unideal or unintuitive results.model_prepare_required (
bool
) – Flag set to True by default.If False, AutoQuant will skip model prepare block in the pipeline.
- run_inference()[source]
Creates a quantization model and performs inference
- Return type:
Tuple
[QuantizationSimModel
,float
]- Returns:
QuantizationSimModel, model accuracy as float
- optimize(allowed_accuracy_drop=0.0)[source]
Integrate and apply post-training quantization techniques.
- Parameters:
allowed_accuracy_drop (
float
) – Maximum allowed accuracy drop- Return type:
Tuple
[Module
,float
,str
,List
[Tuple
[int
,float
,QuantizerGroup
,Tuple
]]]- Returns:
Tuple of (best model, eval score, encoding path, pareto front). Pareto front is None if AMP is not enabled or AutoQuant exits without performing AMP.
- set_adaround_params(adaround_params)[source]
Set Adaround parameters. If this method is not called explicitly by the user, AutoQuant will use data_loader (passed to __init__) for Adaround.
- Parameters:
adaround_params (
AdaroundParameters
) – Adaround parameters.- Return type:
None
- set_export_params(onnx_export_args=-1, propagate_encodings=None)[source]
Set parameters for QuantizationSimModel.export.
- Parameters:
onnx_export_args (
OnnxExportApiArgs
) – optional export argument with onnx specific overrides if not provide export via torchscript graphpropagate_encodings (
Optional
[bool
]) – If True, encoding entries for intermediate ops (when one PyTorch ops results in multiple ONNX nodes) are filled with the same BW and data_type as the output tensor for that series of ops.
- Return type:
None
- set_mixed_precision_params(candidates, num_samples_for_phase_1=128, forward_fn=<function _default_forward_fn>, num_samples_for_phase_2=None)[source]
Set mixed precision parameters. NOTE: Automatic mixed precision will NOT be enabled unless this method is explicitly called by the user.
- Parameters:
candidates (
List
[Tuple
[Tuple
[int
,QuantizationDataType
],Tuple
[int
,QuantizationDataType
]]]) – List of tuples of candidate bitwidths and datatypes.num_samples_for_phase_1 (
Optional
[int
]) – Number of samples to be used for performance evaluation in AMP phase 1.forward_fn (
Callable
) – Function that runs forward pass and returns the output tensor. which will be used for SQNR compuatation in phase 1. This function is expected to take 1) a model and 2) a single batch yielded from the data loader, and return a single torch.Tensor object which represents the output of the model. The default forward function is roughly equivalent tolambda model, batch: model(batch)
num_samples_for_phase_2 (
Optional
[int
]) – Number of samples to be used for performance evaluation in AMP phase 2.
- Return type:
None
- set_model_preparer_params(modules_to_exclude=None, concrete_args=None)[source]
Set parameters for model preparer.
- Parameters:
modules_to_exclude (
Optional
[List
[Module
]]) – List of modules to exclude when tracing.concrete_args (
Optional
[Dict
[str
,Any
]]) – Parameter for model preparer. Allows you to partially specialize your function, whether it’s to remove control flow or data structures. If the model has control flow, torch.fx won’t be able to trace the model. Check torch.fx.symbolic_trace API in detail.
- get_quant_scheme_candidates()[source]
Return the candidates for quant scheme search. During
optimize()
, the candidate with the highest accuracy will be selected among them.- Return type:
Tuple
[_QuantSchemePair
,...
]- Returns:
Candidates for quant scheme search
- set_quant_scheme_candidates(candidates)[source]
Set candidates for quant scheme search. During
optimize()
, the candidate with the highest accuracy will be selected among them.- Parameters:
candidates (
Tuple
[_QuantSchemePair
,...
]) – Candidates for quant scheme search