Quantizers
Top-level API
- class aimet_torch.v2.quantization.affine.quantizer.QuantizerBase[source]
Quantizer base class
- abstract compute_encodings()[source]
Observe inputs and update quantization parameters based on the input statistics.
- abstract get_encoding()[source]
Return the quantizer’s encodings as an EncodingBase object
- Return type:
Optional
[EncodingBase
]
- abstract get_legacy_encodings()[source]
Returns a list of encodings, each represented as a List of Dicts
- Return type:
Optional
[List
[Dict
]]
- class aimet_torch.v2.quantization.affine.quantizer.QuantizeDequantize(shape, *args, **kwargs)[source]
Applies fake-quantization by quantizing and dequantizing the input.
Precisely,
\[out = (\overline{input} + offset) * scale\]where
\[\overline{input} = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]and \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).
If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} &= (\overline{input}_{j_0 \cdots j_{D-1}} + offset_{i_0 \cdots i_{D-1}}) * scale_{i_0 \cdots i_{D-1}}\\ \overline{input}_{j_0 \cdots j_{D-1}} &= clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]- Parameters:
shape (tuple) – Shape of the quantization parameters
bitwidth (int) – Quantization bitwidth
symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization
encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)
block_size (Tuple[int, ...], optional) – Block size
- Variables:
min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.
max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.
Note
QuantizeDequantize
cannot runforward()
untilmin
andmax
are properly initialized, which can be done based on input statistics usingcompute_encodings()
or by manually assigning a new value tomin
andmax
. See the examples below.Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> qdq.is_initialized() False >>> with qdq.compute_encodings(): ... _ = qdq(input) ... >>> qdq.is_initialized() True >>> qdq(input) DequantizedTensor([[-0.2771, 0.3038, 1.0819, 0.9700, 0.9487, -0.1307, -1.7894, -0.1709, -0.2212, 0.7741], [-1.0295, -1.2265, -1.0295, 1.0564, 0.6177, -1.0386, -0.0176, -2.6054, 1.8836, -0.1232], [-0.8229, 0.5540, 0.3992, -0.2363, 1.2546, -1.0036, 0.2355, 0.1741, 1.6079, 0.6247], [-1.0115, 1.2458, 0.9157, -1.4694, -0.0639, -0.2568, 0.0680, 1.6695, 0.7932, -0.1889], [ 0.0158, 0.5695, 0.5220, 0.1977, -1.4475, -0.0424, -1.1128, -0.8796, -0.1060, 1.5897]], grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> qdq.is_initialized() False >>> qdq.min = torch.nn.Parameter(-torch.ones_like(qdq.min)) >>> qdq.max = torch.nn.Parameter(torch.ones_like(qdq.max)) >>> qdq.is_initialized() True >>> qdq(input) DequantizedTensor([[-0.6196, -0.9961, 0.0549, -0.6431, 1.0039, -0.8706, 1.0039, 0.4706, -0.2353, 0.8078], [ 0.3451, -0.1176, -0.9961, -0.4549, -0.0549, -0.0471, -0.5255, -0.2353, 1.0039, -0.9961], [-0.4157, 0.0784, 0.5333, 0.1647, -0.9961, -0.9961, -0.2118, -0.2196, 0.9176, 0.9490], [ 1.0039, -0.7765, 0.4784, -0.8706, 1.0039, 0.6039, -0.4157, -0.2118, -0.9961, 0.3137], [ 1.0039, 0.3216, -0.2353, -0.7765, -0.9961, 0.8000, 1.0039, 0.4157, 0.4392, 0.4863]], grad_fn=<AliasBackward0>)
- class aimet_torch.v2.quantization.affine.quantizer.Quantize(shape, *args, **kwargs)[source]
Applies quantization to the input.
Precisely,
\[out = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]where \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).
If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} & = clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]- Parameters:
shape (tuple) – Shape of the quantization parameters
bitwidth (int) – Quantization bitwidth
symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization
encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)
block_size (Tuple[int, ...], optional) – Block size
- Variables:
min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.
max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.
Note
Quantize
cannot runforward()
untilmin
andmax
are properly initialized, which can be done based on input statistics usingcompute_encodings()
or by manually assigning a new value tomin
andmax
. See the examples below.Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> q.is_initialized() False >>> with q.compute_encodings(): ... _ = q(input) ... >>> q.is_initialized() True >>> q(input) QuantizedTensor([[129., 64., 255., 122., 0., 192., 106., 94., 255., 0.], [ 0., 145., 181., 255., 144., 255., 194., 0., 74., 86.], [122., 0., 255., 150., 33., 103., 103., 0., 37., 255.], [255., 111., 237., 218., 0., 49., 155., 255., 0., 179.], [ 0., 66., 255., 89., 110., 17., 36., 83., 255., 0.]], grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> q.is_initialized() False >>> q.min = torch.nn.Parameter(-torch.ones_like(q.min)) >>> q.max = torch.nn.Parameter(torch.ones_like(q.max)) >>> q.is_initialized() True >>> q(input) QuantizedTensor([[187., 186., 131., 0., 203., 64., 80., 0., 143., 152.], [ 16., 0., 255., 0., 0., 150., 0., 255., 32., 255.], [255., 226., 0., 255., 55., 172., 0., 255., 145., 255.], [207., 146., 216., 238., 0., 0., 141., 178., 255., 188.], [ 63., 59., 19., 162., 30., 255., 109., 255., 0., 255.]], grad_fn=<AliasBackward0>)