quantization.affine

Classes

class aimet_torch.v2.quantization.affine.Quantize(shape, *args, **kwargs)[source]

Applies quantization to the input.

Precisely,

\[out = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]

where \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).

If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as

\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} & = clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]
Parameters:
  • shape (tuple) – Shape of the quantization parameters

  • bitwidth (int) – Quantization bitwidth

  • symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization

  • encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)

  • block_size (Tuple[int, ...], optional) – Block size

Variables:
  • min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.

  • max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.

Note

Quantize cannot run forward() until min and max are properly initialized, which can be done based on input statistics using compute_encodings() or by manually assigning a new value to min and max. See the examples below.

Examples

>>> import aimet_torch.v2.quantization as Q
>>> input = torch.randn(5, 10)
>>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5))
>>> q.is_initialized()
False
>>> with q.compute_encodings():
...     _ = q(input)
...
>>> q.is_initialized()
True
>>> q(input)
QuantizedTensor([[129.,  64., 255., 122.,   0., 192., 106.,  94., 255.,   0.],
                 [  0., 145., 181., 255., 144., 255., 194.,   0.,  74.,  86.],
                 [122.,   0., 255., 150.,  33., 103., 103.,   0.,  37., 255.],
                 [255., 111., 237., 218.,   0.,  49., 155., 255.,   0., 179.],
                 [  0.,  66., 255.,  89., 110.,  17.,  36.,  83., 255.,   0.]],
                grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q
>>> input = torch.randn(5, 10)
>>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5))
>>> q.is_initialized()
False
>>> q.min = torch.nn.Parameter(-torch.ones_like(q.min))
>>> q.max = torch.nn.Parameter(torch.ones_like(q.max))
>>> q.is_initialized()
True
>>> q(input)
QuantizedTensor([[187., 186., 131.,   0., 203.,  64.,  80.,   0., 143., 152.],
                 [ 16.,   0., 255.,   0.,   0., 150.,   0., 255.,  32., 255.],
                 [255., 226.,   0., 255.,  55., 172.,   0., 255., 145., 255.],
                 [207., 146., 216., 238.,   0.,   0., 141., 178., 255., 188.],
                 [ 63.,  59.,  19., 162.,  30., 255., 109., 255.,   0., 255.]],
                grad_fn=<AliasBackward0>)
class aimet_torch.v2.quantization.affine.QuantizeDequantize(shape, *args, **kwargs)[source]

Applies fake-quantization by quantizing and dequantizing the input.

Precisely,

\[out = (\overline{input} + offset) * scale\]

where

\[\overline{input} = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]

and \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).

If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as

\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} &= (\overline{input}_{j_0 \cdots j_{D-1}} + offset_{i_0 \cdots i_{D-1}}) * scale_{i_0 \cdots i_{D-1}}\\ \overline{input}_{j_0 \cdots j_{D-1}} &= clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]
Parameters:
  • shape (tuple) – Shape of the quantization parameters

  • bitwidth (int) – Quantization bitwidth

  • symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization

  • encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)

  • block_size (Tuple[int, ...], optional) – Block size

Variables:
  • min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.

  • max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.

Note

QuantizeDequantize cannot run forward() until min and max are properly initialized, which can be done based on input statistics using compute_encodings() or by manually assigning a new value to min and max. See the examples below.

Examples

>>> import aimet_torch.v2.quantization as Q
>>> input = torch.randn(5, 10)
>>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5))
>>> qdq.is_initialized()
False
>>> with qdq.compute_encodings():
...     _ = qdq(input)
...
>>> qdq.is_initialized()
True
>>> qdq(input)
DequantizedTensor([[-0.2771,  0.3038,  1.0819,  0.9700,  0.9487, -0.1307,
                    -1.7894, -0.1709, -0.2212,  0.7741],
                   [-1.0295, -1.2265, -1.0295,  1.0564,  0.6177, -1.0386,
                    -0.0176, -2.6054,  1.8836, -0.1232],
                   [-0.8229,  0.5540,  0.3992, -0.2363,  1.2546, -1.0036,
                     0.2355,  0.1741,  1.6079,  0.6247],
                   [-1.0115,  1.2458,  0.9157, -1.4694, -0.0639, -0.2568,
                     0.0680,  1.6695,  0.7932, -0.1889],
                   [ 0.0158,  0.5695,  0.5220,  0.1977, -1.4475, -0.0424,
                    -1.1128, -0.8796, -0.1060,  1.5897]],
                  grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q
>>> input = torch.randn(5, 10)
>>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5))
>>> qdq.is_initialized()
False
>>> qdq.min = torch.nn.Parameter(-torch.ones_like(qdq.min))
>>> qdq.max = torch.nn.Parameter(torch.ones_like(qdq.max))
>>> qdq.is_initialized()
True
>>> qdq(input)
DequantizedTensor([[-0.6196, -0.9961,  0.0549, -0.6431,  1.0039, -0.8706,
                     1.0039,  0.4706, -0.2353,  0.8078],
                   [ 0.3451, -0.1176, -0.9961, -0.4549, -0.0549, -0.0471,
                    -0.5255, -0.2353,  1.0039, -0.9961],
                   [-0.4157,  0.0784,  0.5333,  0.1647, -0.9961, -0.9961,
                    -0.2118, -0.2196,  0.9176,  0.9490],
                   [ 1.0039, -0.7765,  0.4784, -0.8706,  1.0039,  0.6039,
                    -0.4157, -0.2118, -0.9961,  0.3137],
                   [ 1.0039,  0.3216, -0.2353, -0.7765, -0.9961,  0.8000,
                     1.0039,  0.4157,  0.4392,  0.4863]],
                  grad_fn=<AliasBackward0>)

Functions

aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, *args, **kwargs)[source]

Applies quantization to the input.

Precisely,

\[out = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]

If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as

\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} & = clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]

This function is overloaded with the signatures listed below:

aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, bitwidth, signed=False, block_size=None)[source]

Equivalent to:

\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{2^{bitwidth}-1}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{2^{bitwidth}-1}{2}\right\rfloor,& \text{if } signed\\ 2^{bitwidth}-1, & \text{otherwise (default)} \end{cases}\end{split}\]
Parameters:
  • tensor (Tensor) – Tensor to quantize

  • scale (Tensor) – Scale for quantization

  • offset (Tensor) – Offset for quantization

  • bitwidth (int) – Bitwidth of quantized tensor based on which \(qmin\) and \(qmax\) will be derived

  • signed (bool) – If false, the output will be mapped to positive integers only. Otherwise, it will range over both positive and negative integers.

  • block_size (Tuple[int, ...], optional) – Block size

aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, *, num_steps, signed=False, block_size=None)[source]

Equivalent to:

\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{num\_steps}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{num\_steps}{2}\right\rfloor,& \text{if } signed\\ num\_steps, & \text{otherwise (default)} \end{cases}\end{split}\]
Parameters:
  • tensor (Tensor) – Tensor to quantize

  • scale (Tensor) – Scale for quantization

  • offset (Tensor) – Offset for quantization

  • num_steps (int) – The number of steps in the quantization range based on which \(qmin\) and \(qmax\) will be derived

  • signed (bool) – If false, the output will be mapped to positive integers only. Otherwise, it will range over both positive and negative integers.

  • block_size (Tuple[int, ...], optional) – Block size

aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, *, qmin, qmax, block_size=None)[source]
Parameters:
  • tensor (Tensor) – Tensor to quantize

  • scale (Tensor) – Scale for quantization

  • offset (Tensor) – Offset for quantization

  • qmin (int) – Minimum value of the quantization range

  • qmax (int) – Maximum value of the quantization range

  • block_size (Tuple[int, ...], optional) – Block size

Examples

>>> import aimet_torch.v2.quantization as Q
>>> input = torch.arange(start=-0.3, end=1.3, step=0.05)
>>> print(input)
tensor([-3.0000e-01, -2.5000e-01, -2.0000e-01, -1.5000e-01, -1.0000e-01,
        -5.0000e-02, -1.1921e-08,  5.0000e-02,  1.0000e-01,  1.5000e-01,
        2.0000e-01,  2.5000e-01,  3.0000e-01,  3.5000e-01,  4.0000e-01,
        4.5000e-01,  5.0000e-01,  5.5000e-01,  6.0000e-01,  6.5000e-01,
        7.0000e-01,  7.5000e-01,  8.0000e-01,  8.5000e-01,  9.0000e-01,
        9.5000e-01,  1.0000e+00,  1.0500e+00,  1.1000e+00,  1.1500e+00,
        1.2000e+00,  1.2500e+00])
>>> scale = torch.tensor(1/15)
>>> offset = torch.tensor(0.0)
>>> Q.affine.quantize(input, scale, offset, bitwidth=4)
tensor([ 0.,  0.,  0.,  0.,  0.,  0., -0.,  1.,  2.,  2.,  3.,  4.,  4.,  5.,
         6.,  7.,  7.,  8.,  9., 10., 10., 11., 12., 13., 13., 14., 15., 15.,
         15., 15., 15., 15.])
>>> Q.affine.quantize(input, scale, offset, num_steps=15)
tensor([ 0.,  0.,  0.,  0.,  0.,  0., -0.,  1.,  2.,  2.,  3.,  4.,  4.,  5.,
         6.,  7.,  7.,  8.,  9., 10., 10., 11., 12., 13., 13., 14., 15., 15.,
         15., 15., 15., 15.])
>>> Q.affine.quantize(input, scale, offset, qmin=0, qmax=15)
tensor([ 0.,  0.,  0.,  0.,  0.,  0., -0.,  1.,  2.,  2.,  3.,  4.,  4.,  5.,
         6.,  7.,  7.,  8.,  9., 10., 10., 11., 12., 13., 13., 14., 15., 15.,
         15., 15., 15., 15.])
aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, *args, **kwargs)[source]

Applies fake-quantization by quantizing and dequantizing the input.

Precisely,

\[out = (\overline{input} + offset) * scale\]

where

\[\overline{input} = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]

If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as

\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} &= (\overline{input}_{j_0 \cdots j_{D-1}} + offset_{i_0 \cdots i_{D-1}}) * scale_{i_0 \cdots i_{D-1}}\\ \overline{input}_{j_0 \cdots j_{D-1}} &= clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where } \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]

This function is overloaded with the signatures listed below:

aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, bitwidth, signed=False, block_size=None)[source]

Equivalent to:

\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{2^{bitwidth}-1}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{2^{bitwidth}-1}{2}\right\rfloor,& \text{if } signed\\ 2^{bitwidth}-1, & \text{otherwise (default)} \end{cases}\end{split}\]
Parameters:
  • tensor (Tensor) – Tensor to quantize

  • scale (Tensor) – Scale for quantization

  • offset (Tensor) – Offset for quantization

  • bitwidth (int) – Bitwidth of quantized tensor based on which \(qmin\) and \(qmax\) will be derived

  • signed (bool) – If false, \(\overline{input}\) will be mapped to positive integers only. Otherwise, \(\overline{input}\) will range over both positive and negative integers.

  • block_size (Tuple[int, ...], optional) – Block size

aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, *, num_steps, signed=False, block_size=None)[source]

Equivalent to:

\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{num\_steps}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{num\_steps}{2}\right\rfloor,& \text{if } signed\\ num\_steps, & \text{otherwise (default)} \end{cases}\end{split}\]
Parameters:
  • tensor (Tensor) – Tensor to quantize

  • scale (Tensor) – Scale for quantization

  • offset (Tensor) – Offset for quantization

  • num_steps (int) – The number of steps in the quantization range based on which \(qmin\) and \(qmax\) will be derived

  • signed (bool) – If false, \(\overline{input}\) will be mapped to positive integers only. Otherwise, \(\overline{input}\) will range over both positive and negative integers.

  • block_size (Tuple[int, ...], optional) – Block size

aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, *, qmin, qmax, block_size=None)[source]
Parameters:
  • tensor (Tensor) – Tensor to quantize

  • scale (Tensor) – Scale for quantization

  • offset (Tensor) – Offset for quantization

  • qmin (int) – Minimum value of the quantization range

  • qmax (int) – Maximum value of the quantization range

  • block_size (Tuple[int, ...], optional) – Block size

Examples

>>> import aimet_torch.v2.quantization as Q
>>> input = torch.arange(start=-0.3, end=1.3, step=0.05)
>>> print(input)
tensor([-3.0000e-01, -2.5000e-01, -2.0000e-01, -1.5000e-01, -1.0000e-01,
        -5.0000e-02, -1.1921e-08,  5.0000e-02,  1.0000e-01,  1.5000e-01,
        2.0000e-01,  2.5000e-01,  3.0000e-01,  3.5000e-01,  4.0000e-01,
        4.5000e-01,  5.0000e-01,  5.5000e-01,  6.0000e-01,  6.5000e-01,
        7.0000e-01,  7.5000e-01,  8.0000e-01,  8.5000e-01,  9.0000e-01,
        9.5000e-01,  1.0000e+00,  1.0500e+00,  1.1000e+00,  1.1500e+00,
        1.2000e+00,  1.2500e+00])
>>> scale = torch.tensor(1/15)
>>> offset = torch.tensor(0.0)
>>> Q.affine.quantize_dequantize(input, scale, offset, bitwidth=4)
tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0667, 0.1333,
        0.1333, 0.2000, 0.2667, 0.2667, 0.3333, 0.4000, 0.4667, 0.4667, 0.5333,
        0.6000, 0.6667, 0.6667, 0.7333, 0.8000, 0.8667, 0.8667, 0.9333, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
>>> Q.affine.quantize_dequantize(input, scale, offset, num_steps=15)
tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0667, 0.1333,
        0.1333, 0.2000, 0.2667, 0.2667, 0.3333, 0.4000, 0.4667, 0.4667, 0.5333,
        0.6000, 0.6667, 0.6667, 0.7333, 0.8000, 0.8667, 0.8667, 0.9333, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
>>> Q.affine.quantize_dequantize(input, scale, offset, qmin=0, qmax=15)
tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0667, 0.1333,
        0.1333, 0.2000, 0.2667, 0.2667, 0.3333, 0.4000, 0.4667, 0.4667, 0.5333,
        0.6000, 0.6667, 0.6667, 0.7333, 0.8000, 0.8667, 0.8667, 0.9333, 1.0000,
        1.0000, 1.0000, 1.0000, 1.0000, 1.0000])
aimet_torch.v2.quantization.affine.dequantize(tensor, scale, offset, block_size=None)[source]