aimet_onnx.auto_quant_v2

Top-level API

class aimet_onnx.auto_quant_v2.AutoQuantWithAutoMixedPrecision(model, dummy_input, data_loader, eval_callback, param_bw=8, output_bw=8, quant_scheme=QuantScheme.post_training_tf_enhanced, rounding_mode='nearest', use_cuda=True, device=0, config_file=None, results_dir='/tmp', cache_id=None, strict_validation=True)[source]

Integrate and apply post-training quantization techniques.

AutoQuant includes 1) batchnorm folding, 2) cross-layer equalization, 3) Adaround, and 4) Automatic Mixed Precision (if enabled). These techniques will be applied in a best-effort manner until the model meets the evaluation goal given as allowed_accuracy_drop.

Parameters:
  • model (ONNXModel) – Model to be quantized.

  • dummy_input (Dict[str, ndarray]) – Dummy input dict for the model.

  • data_loader (DataLoader) – A collection that iterates over an unlabeled dataset, used for computing encodings

  • eval_callback (Callable[[InferenceSession, int], float]) – Function that calculates the evaluation score given the model session

  • param_bw (int) – Parameter bitwidth

  • output_bw (int) – Output bitwidth

  • quant_scheme (QuantScheme) – Quantization scheme

  • rounding_mode (str) – Rounding mode

  • use_cuda (bool) – True if using CUDA to run quantization op. False otherwise.

  • config_file (Optional[str]) – Path to configuration file for model quantizers

  • results_dir (str) – Directory to save the results of PTQ techniques

  • cache_id (Optional[str]) – ID associated with cache results

  • strict_validation (bool) – Flag set to True by default.When False, AutoQuant will proceed with execution and handle errors internally if possible. This may produce unideal or unintuitive results.

run_inference()[source]

Creates a quantization model and performs inference

Return type:

Tuple[QuantizationSimModel, float]

Returns:

QuantizationSimModel, model accuracy as float

optimize(allowed_accuracy_drop=0.0)[source]

Integrate and apply post-training quantization techniques.

Parameters:

allowed_accuracy_drop (float) – Maximum allowed accuracy drop

Return type:

Tuple[ONNXModel, float, str, List[Tuple[int, float, QuantizerGroup, Tuple]]]

Returns:

Tuple of (best model, eval score, encoding path, pareto front). Pareto front is None if AMP is not enabled or AutoQuant exits without performing AMP.

set_adaround_params(adaround_params)[source]

Set Adaround parameters. If this method is not called explicitly by the user, AutoQuant will use data_loader (passed to __init__) for Adaround.

Parameters:

adaround_params (AdaroundParameters) – Adaround parameters.

Return type:

None

set_mixed_precision_params(candidates, num_samples_for_phase_1=128, forward_fn=<function _default_forward_fn>, num_samples_for_phase_2=None)[source]

Set mixed precision parameters. NOTE: Automatic mixed precision will NOT be enabled unless this method is explicitly called by the user.

Parameters:
  • candidates (List[Tuple[Tuple[int, QuantizationDataType], Tuple[int, QuantizationDataType]]]) – List of tuples of candidate bitwidths and datatypes.

  • num_samples_for_phase_1 (Optional[int]) – Number of samples to be used for performance evaluation in AMP phase 1.

  • forward_fn (Callable) – Function that runs forward pass and returns the output tensor. which will be used for SQNR compuatation in phase 1. This function is expected to take 1) a model and 2) a single batch yielded from the data loader, and return a single np.ndarray object which represents the output of the model.

  • num_samples_for_phase_2 (Optional[int]) – Number of samples to be used for performance evaluation in AMP phase 2.

Return type:

None

get_quant_scheme_candidates()[source]

Return the candidates for quant scheme search. During optimize(), the candidate with the highest accuracy will be selected among them.

Return type:

Tuple[_QuantSchemePair, ...]

Returns:

Candidates for quant scheme search

set_quant_scheme_candidates(candidates)[source]

Set candidates for quant scheme search. During optimize(), the candidate with the highest accuracy will be selected among them.

Parameters:

candidates (Tuple[_QuantSchemePair, ...]) – Candidates for quant scheme search