Encoding Analyzers

class aimet_torch.v2.quantization.encoding_analyzer.EncodingAnalyzer(observer)[source]

Base class that gathers statistics of input data and computes encodings

compute_encodings(num_steps, is_symmetric)[source]

Computes encodings based on the input data & calibration scheme and returns the encoding minimum and maximum value

Parameters:
  • num_steps (int) – Number of steps used in quantization.

  • is_symmetric (bool) – True if encodings are symmetric

Returns:

Encoding min and max as a tuple

reset_stats()[source]

Resets the internal stats

update_stats(input_tensor)[source]

Updates the internal statistics given the input data

Parameters:

input_tensor (torch.Tensor) – Input data

Variants

class aimet_torch.v2.quantization.encoding_analyzer.MinMaxEncodingAnalyzer(shape)[source]

EncodingAnalyzer subclass which uses min-max calibration. This involves tracking the minimum and maximum observed values and computing the min-max range as \([min(input), max(input)]\)

Parameters:

shape (tuple) – Shape of calculated encoding

Example

>>> from aimet_torch.v2.quantization.encoding_analyzer import MinMaxEncodingAnalyzer
>>> encoding_analyzer = MinMaxEncodingAnalyzer(shape=(1,))
>>> encoding_analyzer.update_stats(torch.randn(100))
>>> encoding_analyzer.compute_encodings(num_steps=math.pow(2, 8), is_symmetric=False)
(tensor([-2.0991]), tensor([2.3696]))
>>> encoding_analyzer.reset_stats()
>>> encoding_analyzer.update_stats(torch.randn(100))
_MinMaxRange(min=tensor([-2.1721]), max=tensor([2.2592]))
>>> encoding_analyzer.compute_encodings(num_steps=math.pow(2, 8), is_symmetric=False)
(tensor([-2.1721]), tensor([2.2592]))
class aimet_torch.v2.quantization.encoding_analyzer.SqnrEncodingAnalyzer(shape, num_bins=2048, *, asymmetric_delta_candidates=17, symmetric_delta_candidates=101, offset_candidates=21, max_parallelism=64, gamma=3.0)[source]

EncodingAnalyzer subclass which uses SQNR calibration. This involves recording values in a histogram and computing the min-max range based on values that produce the lowest expected SQNR.

Parameters:
  • shape (tuple) – Shape of calculated encoding

  • num_bins (int) – Number of bins used to create the histogram

  • asymmetric_delta_candidates (int) – Number of delta values to search over in asymmetric mode

  • symmetric_delta_candidates (int) – Number of delta values to search over in symmetric mode

  • offset_candidates (int) – Number of offset values to search over in asymmetric mode

  • max_parallelism (int) – Maximum number of encodings to process in parallel (higher number results in higher memory usage but faster computation)

  • gamma (float) – Weighting factor on clipping noise (higher value results in less clipping noise)

  • percentile (float) – Percentile value which is used to clip values

Example

>>> from aimet_torch.v2.quantization.encoding_analyzer import SqnrEncodingAnalyzer
>>> encoding_analyzer = SqnrEncodingAnalyzer(shape=(1,), num_bins = 10, gamma = 1)
>>> encoding_analyzer.update_stats(torch.randn(100))
>>> encoding_analyzer.compute_encodings(num_steps = math.pow(2, 8), is_symmetric = False)
(tensor([-2.3612]), tensor([2.8497]))
>>> encoding_analyzer.reset_stats()
>>> encoding_analyzer.update_stats(torch.randn(100))
[_Histogram(histogram=tensor([ 2.,  0.,  8.,  8., 16., 22., 23., 12.,  6.,  3.]), bin_edges=tensor([-2.8907, -2.3625, -1.8343, -1.3061, -0.7779, -0.2497,  0.2784,  0.8066, 1.3348,  1.8630,  2.3912]), min=tensor(-2.8907), max=tensor(2.3912))]
>>> encoding_analyzer.compute_encodings(num_steps = math.pow(2, 8), is_symmetric = False)
(tensor([-2.7080]), tensor([2.2438]))
class aimet_torch.v2.quantization.encoding_analyzer.PercentileEncodingAnalyzer(shape, num_bins=2048, percentile=100)[source]

EncodingAnalyzer subclass which uses percentile calibration. This involves recording values in a histogram and computing the min-max range given a percentile value \(p\). The range would be computed after clipping (100 - \(p\))% of the largest and smallest observed values.

Parameters:
  • shape (tuple) – Shape of calculated encoding

  • num_bins (int) – Number of bins used to create the histogram

  • percentile (float) – Percentile value which is used to clip values

Example

>>> from aimet_torch.v2.quantization.encoding_analyzer import PercentileEncodingAnalyzer
>>> encoding_analyzer = PercentileEncodingAnalyzer(shape=(1,), num_bins = 10, percentile = 80)
>>> encoding_analyzer.update_stats(torch.randn(100))
>>> encoding_analyzer.compute_encodings(num_steps = math.pow(2, 8), is_symmetric = False)
(tensor([-1.1188]), tensor([0.3368]))
>>> encoding_analyzer.reset_stats()
>>> encoding_analyzer.update_stats(torch.randn(100))
[_Histogram(histogram=tensor([ 1.,  1.,  8., 13., 19., 27., 16., 10.,  3.,  2.]), bin_edges=tensor([-2.5710, -2.0989, -1.6269, -1.1548, -0.6827, -0.2106,  0.2614,  0.7335, 1.2056,  1.6776,  2.1497]), min=tensor(-2.5710), max=tensor(2.1497))]
>>> encoding_analyzer.compute_encodings(num_steps = math.pow(2, 8), is_symmetric = False)
(tensor([-1.1548]), tensor([0.2614]))