QuantizeDequantize
- class aimet_torch.v2.quantization.affine.QuantizeDequantize(shape, bitwidth, symmetric, encoding_analyzer=None, block_size=None)[source]
Applies fake-quantization by quantizing and dequantizing the input.
Precisely,
\[out = (\overline{input} + offset) * scale\]where
\[\overline{input} = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]and \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).
If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} &= (\overline{input}_{j_0 \cdots j_{D-1}} + offset_{i_0 \cdots i_{D-1}}) * scale_{i_0 \cdots i_{D-1}}\\ \overline{input}_{j_0 \cdots j_{D-1}} &= clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]- Parameters
shape (tuple) – Shape of the quantization parameters
bitwidth (int) – Quantization bitwidth
symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization
encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)
block_size (Tuple[int, ...], optional) – Block size
- Variables
min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.
max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.
Note
QuantizeDequantize
cannot runforward()
untilmin
andmax
are properly initialized, which can be done based on input statistics usingcompute_encodings()
or by manually assigning a new value tomin
andmax
. See the examples below.Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> qdq.is_initialized() False >>> with qdq.compute_encodings(): ... _ = qdq(input) ... >>> qdq.is_initialized() True >>> qdq(input) DequantizedTensor([[-0.2771, 0.3038, 1.0819, 0.9700, 0.9487, -0.1307, -1.7894, -0.1709, -0.2212, 0.7741], [-1.0295, -1.2265, -1.0295, 1.0564, 0.6177, -1.0386, -0.0176, -2.6054, 1.8836, -0.1232], [-0.8229, 0.5540, 0.3992, -0.2363, 1.2546, -1.0036, 0.2355, 0.1741, 1.6079, 0.6247], [-1.0115, 1.2458, 0.9157, -1.4694, -0.0639, -0.2568, 0.0680, 1.6695, 0.7932, -0.1889], [ 0.0158, 0.5695, 0.5220, 0.1977, -1.4475, -0.0424, -1.1128, -0.8796, -0.1060, 1.5897]], grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> qdq.is_initialized() False >>> qdq.min = torch.nn.Parameter(-torch.ones_like(qdq.min)) >>> qdq.max = torch.nn.Parameter(torch.ones_like(qdq.max)) >>> qdq.is_initialized() True >>> qdq(input) DequantizedTensor([[-0.6196, -0.9961, 0.0549, -0.6431, 1.0039, -0.8706, 1.0039, 0.4706, -0.2353, 0.8078], [ 0.3451, -0.1176, -0.9961, -0.4549, -0.0549, -0.0471, -0.5255, -0.2353, 1.0039, -0.9961], [-0.4157, 0.0784, 0.5333, 0.1647, -0.9961, -0.9961, -0.2118, -0.2196, 0.9176, 0.9490], [ 1.0039, -0.7765, 0.4784, -0.8706, 1.0039, 0.6039, -0.4157, -0.2118, -0.9961, 0.3137], [ 1.0039, 0.3216, -0.2353, -0.7765, -0.9961, 0.8000, 1.0039, 0.4157, 0.4392, 0.4863]], grad_fn=<AliasBackward0>)