Quantize¶
- class aimet_torch.quantization.affine.Quantize(shape, *args, **kwargs)[source]¶
Applies quantization to the input.
Precisely,
\[out = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]where \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).
If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} & = clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]- Parameters:
shape (tuple) – Shape of the quantization parameters
bitwidth (int) – Quantization bitwidth
symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization
encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)
block_size (Tuple[int, ...], optional) – Block size
- Variables:
min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.
max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.
Note
Quantize
cannot runforward()
untilmin
andmax
are properly initialized, which can be done based on input statistics usingcompute_encodings()
or by manually assigning a new value tomin
andmax
. See the examples below.Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> q.is_initialized() False >>> with q.compute_encodings(): ... _ = q(input) ... >>> q.is_initialized() True >>> q(input) QuantizedTensor([[129., 64., 255., 122., 0., 192., 106., 94., 255., 0.], [ 0., 145., 181., 255., 144., 255., 194., 0., 74., 86.], [122., 0., 255., 150., 33., 103., 103., 0., 37., 255.], [255., 111., 237., 218., 0., 49., 155., 255., 0., 179.], [ 0., 66., 255., 89., 110., 17., 36., 83., 255., 0.]], grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> q.is_initialized() False >>> q.min = torch.nn.Parameter(-torch.ones_like(q.min)) >>> q.max = torch.nn.Parameter(torch.ones_like(q.max)) >>> q.is_initialized() True >>> q(input) QuantizedTensor([[187., 186., 131., 0., 203., 64., 80., 0., 143., 152.], [ 16., 0., 255., 0., 0., 150., 0., 255., 32., 255.], [255., 226., 0., 255., 55., 172., 0., 255., 145., 255.], [207., 146., 216., 238., 0., 0., 141., 178., 255., 188.], [ 63., 59., 19., 162., 30., 255., 109., 255., 0., 255.]], grad_fn=<AliasBackward0>)