Quantize

class aimet_torch.v2.quantization.affine.Quantize(shape, bitwidth, symmetric, encoding_analyzer=None, block_size=None)[source]

Applies quantization to the input.

Precisely,

\[out = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]

where \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).

If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as

\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} & = clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]

Parameters

shape (tuple) – Shape of the quantization parameters
bitwidth (int) – Quantization bitwidth
symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization
encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)
block_size (Tuple[int, ...], optional) – Block size

Variables

min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.
max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.

Note

Quantize cannot run forward() until min and max are properly initialized, which can be done based on input statistics using compute_encodings() or by manually assigning a new value to min and max. See the examples below.

Examples

>>> import aimet_torch.v2.quantization as Q
>>> input = torch.randn(5, 10)
>>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5))
>>> q.is_initialized()
False
>>> with q.compute_encodings():
...     _ = q(input)
...
>>> q.is_initialized()
True
>>> q(input)
QuantizedTensor([[129.,  64., 255., 122.,   0., 192., 106.,  94., 255.,   0.],
                 [  0., 145., 181., 255., 144., 255., 194.,   0.,  74.,  86.],
                 [122.,   0., 255., 150.,  33., 103., 103.,   0.,  37., 255.],
                 [255., 111., 237., 218.,   0.,  49., 155., 255.,   0., 179.],
                 [  0.,  66., 255.,  89., 110.,  17.,  36.,  83., 255.,   0.]],
                grad_fn=<AliasBackward0>)

>>> import aimet_torch.v2.quantization as Q
>>> input = torch.randn(5, 10)
>>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5))
>>> q.is_initialized()
False
>>> q.min = torch.nn.Parameter(-torch.ones_like(q.min))
>>> q.max = torch.nn.Parameter(torch.ones_like(q.max))
>>> q.is_initialized()
True
>>> q(input)
QuantizedTensor([[187., 186., 131.,   0., 203.,  64.,  80.,   0., 143., 152.],
                 [ 16.,   0., 255.,   0.,   0., 150.,   0., 255.,  32., 255.],
                 [255., 226.,   0., 255.,  55., 172.,   0., 255., 145., 255.],
                 [207., 146., 216., 238.,   0.,   0., 141., 178., 255., 188.],
                 [ 63.,  59.,  19., 162.,  30., 255., 109., 255.,   0., 255.]],
                grad_fn=<AliasBackward0>)

forward(input)[source]

Quantizes the input tensor

Return type: QuantizedTensor
Parameters: input (torch.Tensor) – Input to quantize
Returns: Quantized output