quantization.affine
Classes
- class aimet_torch.v2.quantization.affine.Quantize(shape, *args, **kwargs)[source]
Applies quantization to the input.
Precisely,
\[out = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]where \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).
If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} & = clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]- Parameters:
shape (tuple) – Shape of the quantization parameters
bitwidth (int) – Quantization bitwidth
symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization
encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)
block_size (Tuple[int, ...], optional) – Block size
- Variables:
min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.
max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.
Note
Quantize
cannot runforward()
untilmin
andmax
are properly initialized, which can be done based on input statistics usingcompute_encodings()
or by manually assigning a new value tomin
andmax
. See the examples below.Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> q.is_initialized() False >>> with q.compute_encodings(): ... _ = q(input) ... >>> q.is_initialized() True >>> q(input) QuantizedTensor([[129., 64., 255., 122., 0., 192., 106., 94., 255., 0.], [ 0., 145., 181., 255., 144., 255., 194., 0., 74., 86.], [122., 0., 255., 150., 33., 103., 103., 0., 37., 255.], [255., 111., 237., 218., 0., 49., 155., 255., 0., 179.], [ 0., 66., 255., 89., 110., 17., 36., 83., 255., 0.]], grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> q.is_initialized() False >>> q.min = torch.nn.Parameter(-torch.ones_like(q.min)) >>> q.max = torch.nn.Parameter(torch.ones_like(q.max)) >>> q.is_initialized() True >>> q(input) QuantizedTensor([[187., 186., 131., 0., 203., 64., 80., 0., 143., 152.], [ 16., 0., 255., 0., 0., 150., 0., 255., 32., 255.], [255., 226., 0., 255., 55., 172., 0., 255., 145., 255.], [207., 146., 216., 238., 0., 0., 141., 178., 255., 188.], [ 63., 59., 19., 162., 30., 255., 109., 255., 0., 255.]], grad_fn=<AliasBackward0>)
- class aimet_torch.v2.quantization.affine.QuantizeDequantize(shape, *args, **kwargs)[source]
Applies fake-quantization by quantizing and dequantizing the input.
Precisely,
\[out = (\overline{input} + offset) * scale\]where
\[\overline{input} = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]and \(scale\) and \(offset\) are derived from learnable parameters \(\theta_{min}\) and \(\theta_{max}\).
If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} &= (\overline{input}_{j_0 \cdots j_{D-1}} + offset_{i_0 \cdots i_{D-1}}) * scale_{i_0 \cdots i_{D-1}}\\ \overline{input}_{j_0 \cdots j_{D-1}} &= clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]- Parameters:
shape (tuple) – Shape of the quantization parameters
bitwidth (int) – Quantization bitwidth
symmetric (bool) – If True, performs symmetric quantization; otherwise, performs asymmetric quantization
encoding_analyzer (EncodingAnalyzer, optional) – Encoding analyzer for calibrating quantization encodings (default: absolute min-max encoding analyzer)
block_size (Tuple[int, ...], optional) – Block size
- Variables:
min (Tensor) – \(\theta_{min}\) from which scale and offset will be derived.
max (Tensor) – \(\theta_{max}\) from which scale and offset will be derived.
Note
QuantizeDequantize
cannot runforward()
untilmin
andmax
are properly initialized, which can be done based on input statistics usingcompute_encodings()
or by manually assigning a new value tomin
andmax
. See the examples below.Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> qdq.is_initialized() False >>> with qdq.compute_encodings(): ... _ = qdq(input) ... >>> qdq.is_initialized() True >>> qdq(input) DequantizedTensor([[-0.2771, 0.3038, 1.0819, 0.9700, 0.9487, -0.1307, -1.7894, -0.1709, -0.2212, 0.7741], [-1.0295, -1.2265, -1.0295, 1.0564, 0.6177, -1.0386, -0.0176, -2.6054, 1.8836, -0.1232], [-0.8229, 0.5540, 0.3992, -0.2363, 1.2546, -1.0036, 0.2355, 0.1741, 1.6079, 0.6247], [-1.0115, 1.2458, 0.9157, -1.4694, -0.0639, -0.2568, 0.0680, 1.6695, 0.7932, -0.1889], [ 0.0158, 0.5695, 0.5220, 0.1977, -1.4475, -0.0424, -1.1128, -0.8796, -0.1060, 1.5897]], grad_fn=<AliasBackward0>)
>>> import aimet_torch.v2.quantization as Q >>> input = torch.randn(5, 10) >>> qdq = Q.affine.QuantizeDequantize(shape=(5, 2), bitwidth=8, symmetric=False, block_size=(1, 5)) >>> qdq.is_initialized() False >>> qdq.min = torch.nn.Parameter(-torch.ones_like(qdq.min)) >>> qdq.max = torch.nn.Parameter(torch.ones_like(qdq.max)) >>> qdq.is_initialized() True >>> qdq(input) DequantizedTensor([[-0.6196, -0.9961, 0.0549, -0.6431, 1.0039, -0.8706, 1.0039, 0.4706, -0.2353, 0.8078], [ 0.3451, -0.1176, -0.9961, -0.4549, -0.0549, -0.0471, -0.5255, -0.2353, 1.0039, -0.9961], [-0.4157, 0.0784, 0.5333, 0.1647, -0.9961, -0.9961, -0.2118, -0.2196, 0.9176, 0.9490], [ 1.0039, -0.7765, 0.4784, -0.8706, 1.0039, 0.6039, -0.4157, -0.2118, -0.9961, 0.3137], [ 1.0039, 0.3216, -0.2353, -0.7765, -0.9961, 0.8000, 1.0039, 0.4157, 0.4392, 0.4863]], grad_fn=<AliasBackward0>)
Functions
- aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, *args, **kwargs)[source]
Applies quantization to the input.
Precisely,
\[out = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} & = clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where} \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]This function is overloaded with the signatures listed below:
- aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, bitwidth, signed=False, block_size=None)[source]
Equivalent to:
\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{2^{bitwidth}-1}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{2^{bitwidth}-1}{2}\right\rfloor,& \text{if } signed\\ 2^{bitwidth}-1, & \text{otherwise (default)} \end{cases}\end{split}\]- Parameters:
tensor (Tensor) – Tensor to quantize
scale (Tensor) – Scale for quantization
offset (Tensor) – Offset for quantization
bitwidth (int) – Bitwidth of quantized tensor based on which \(qmin\) and \(qmax\) will be derived
signed (bool) – If false, the output will be mapped to positive integers only. Otherwise, it will range over both positive and negative integers.
block_size (Tuple[int, ...], optional) – Block size
- aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, *, num_steps, signed=False, block_size=None)[source]
Equivalent to:
\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{num\_steps}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{num\_steps}{2}\right\rfloor,& \text{if } signed\\ num\_steps, & \text{otherwise (default)} \end{cases}\end{split}\]- Parameters:
tensor (Tensor) – Tensor to quantize
scale (Tensor) – Scale for quantization
offset (Tensor) – Offset for quantization
num_steps (int) – The number of steps in the quantization range based on which \(qmin\) and \(qmax\) will be derived
signed (bool) – If false, the output will be mapped to positive integers only. Otherwise, it will range over both positive and negative integers.
block_size (Tuple[int, ...], optional) – Block size
- aimet_torch.v2.quantization.affine.quantize(tensor, scale, offset, *, qmin, qmax, block_size=None)[source]
- Parameters:
tensor (Tensor) – Tensor to quantize
scale (Tensor) – Scale for quantization
offset (Tensor) – Offset for quantization
qmin (int) – Minimum value of the quantization range
qmax (int) – Maximum value of the quantization range
block_size (Tuple[int, ...], optional) – Block size
Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.arange(start=-0.3, end=1.3, step=0.05) >>> print(input) tensor([-3.0000e-01, -2.5000e-01, -2.0000e-01, -1.5000e-01, -1.0000e-01, -5.0000e-02, -1.1921e-08, 5.0000e-02, 1.0000e-01, 1.5000e-01, 2.0000e-01, 2.5000e-01, 3.0000e-01, 3.5000e-01, 4.0000e-01, 4.5000e-01, 5.0000e-01, 5.5000e-01, 6.0000e-01, 6.5000e-01, 7.0000e-01, 7.5000e-01, 8.0000e-01, 8.5000e-01, 9.0000e-01, 9.5000e-01, 1.0000e+00, 1.0500e+00, 1.1000e+00, 1.1500e+00, 1.2000e+00, 1.2500e+00]) >>> scale = torch.tensor(1/15) >>> offset = torch.tensor(0.0) >>> Q.affine.quantize(input, scale, offset, bitwidth=4) tensor([ 0., 0., 0., 0., 0., 0., -0., 1., 2., 2., 3., 4., 4., 5., 6., 7., 7., 8., 9., 10., 10., 11., 12., 13., 13., 14., 15., 15., 15., 15., 15., 15.]) >>> Q.affine.quantize(input, scale, offset, num_steps=15) tensor([ 0., 0., 0., 0., 0., 0., -0., 1., 2., 2., 3., 4., 4., 5., 6., 7., 7., 8., 9., 10., 10., 11., 12., 13., 13., 14., 15., 15., 15., 15., 15., 15.]) >>> Q.affine.quantize(input, scale, offset, qmin=0, qmax=15) tensor([ 0., 0., 0., 0., 0., 0., -0., 1., 2., 2., 3., 4., 4., 5., 6., 7., 7., 8., 9., 10., 10., 11., 12., 13., 13., 14., 15., 15., 15., 15., 15., 15.])
- aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, *args, **kwargs)[source]
Applies fake-quantization by quantizing and dequantizing the input.
Precisely,
\[out = (\overline{input} + offset) * scale\]where
\[\overline{input} = clamp\left(\left\lceil\frac{input}{scale}\right\rfloor - offset, qmin, qmax\right)\]If block size \(B = \begin{pmatrix} B_0 & B_1 & \cdots & B_{D-1} \end{pmatrix}\) is specified, this equation will be further generalized as
\[ \begin{align}\begin{aligned}\begin{split}out_{j_0 \cdots j_{D-1}} &= (\overline{input}_{j_0 \cdots j_{D-1}} + offset_{i_0 \cdots i_{D-1}}) * scale_{i_0 \cdots i_{D-1}}\\ \overline{input}_{j_0 \cdots j_{D-1}} &= clamp\left( \left\lceil\frac{input_{j_0 \cdots j_{D-1}}}{scale_{i_0 \cdots i_{D-1}}}\right\rfloor - offset_{i_0 \cdots i_{D-1}}, qmin, qmax\right)\\\end{split}\\\text{where } \quad \forall_{0 \leq d < D} \quad i_d = \left\lfloor \frac{j_d}{B_d} \right\rfloor\end{aligned}\end{align} \]This function is overloaded with the signatures listed below:
- aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, bitwidth, signed=False, block_size=None)[source]
Equivalent to:
\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{2^{bitwidth}-1}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{2^{bitwidth}-1}{2}\right\rfloor,& \text{if } signed\\ 2^{bitwidth}-1, & \text{otherwise (default)} \end{cases}\end{split}\]- Parameters:
tensor (Tensor) – Tensor to quantize
scale (Tensor) – Scale for quantization
offset (Tensor) – Offset for quantization
bitwidth (int) – Bitwidth of quantized tensor based on which \(qmin\) and \(qmax\) will be derived
signed (bool) – If false, \(\overline{input}\) will be mapped to positive integers only. Otherwise, \(\overline{input}\) will range over both positive and negative integers.
block_size (Tuple[int, ...], optional) – Block size
- aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, *, num_steps, signed=False, block_size=None)[source]
Equivalent to:
\[\begin{split}qmin= \begin{cases} -\left\lceil\frac{num\_steps}{2}\right\rceil,& \text{if } signed\\ 0, & \text{otherwise (default)} \end{cases} qmax= \begin{cases} \left\lfloor\frac{num\_steps}{2}\right\rfloor,& \text{if } signed\\ num\_steps, & \text{otherwise (default)} \end{cases}\end{split}\]- Parameters:
tensor (Tensor) – Tensor to quantize
scale (Tensor) – Scale for quantization
offset (Tensor) – Offset for quantization
num_steps (int) – The number of steps in the quantization range based on which \(qmin\) and \(qmax\) will be derived
signed (bool) – If false, \(\overline{input}\) will be mapped to positive integers only. Otherwise, \(\overline{input}\) will range over both positive and negative integers.
block_size (Tuple[int, ...], optional) – Block size
- aimet_torch.v2.quantization.affine.quantize_dequantize(tensor, scale, offset, *, qmin, qmax, block_size=None)[source]
- Parameters:
tensor (Tensor) – Tensor to quantize
scale (Tensor) – Scale for quantization
offset (Tensor) – Offset for quantization
qmin (int) – Minimum value of the quantization range
qmax (int) – Maximum value of the quantization range
block_size (Tuple[int, ...], optional) – Block size
Examples
>>> import aimet_torch.v2.quantization as Q >>> input = torch.arange(start=-0.3, end=1.3, step=0.05) >>> print(input) tensor([-3.0000e-01, -2.5000e-01, -2.0000e-01, -1.5000e-01, -1.0000e-01, -5.0000e-02, -1.1921e-08, 5.0000e-02, 1.0000e-01, 1.5000e-01, 2.0000e-01, 2.5000e-01, 3.0000e-01, 3.5000e-01, 4.0000e-01, 4.5000e-01, 5.0000e-01, 5.5000e-01, 6.0000e-01, 6.5000e-01, 7.0000e-01, 7.5000e-01, 8.0000e-01, 8.5000e-01, 9.0000e-01, 9.5000e-01, 1.0000e+00, 1.0500e+00, 1.1000e+00, 1.1500e+00, 1.2000e+00, 1.2500e+00]) >>> scale = torch.tensor(1/15) >>> offset = torch.tensor(0.0) >>> Q.affine.quantize_dequantize(input, scale, offset, bitwidth=4) tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0667, 0.1333, 0.1333, 0.2000, 0.2667, 0.2667, 0.3333, 0.4000, 0.4667, 0.4667, 0.5333, 0.6000, 0.6667, 0.6667, 0.7333, 0.8000, 0.8667, 0.8667, 0.9333, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]) >>> Q.affine.quantize_dequantize(input, scale, offset, num_steps=15) tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0667, 0.1333, 0.1333, 0.2000, 0.2667, 0.2667, 0.3333, 0.4000, 0.4667, 0.4667, 0.5333, 0.6000, 0.6667, 0.6667, 0.7333, 0.8000, 0.8667, 0.8667, 0.9333, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000]) >>> Q.affine.quantize_dequantize(input, scale, offset, qmin=0, qmax=15) tensor([0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0667, 0.1333, 0.1333, 0.2000, 0.2667, 0.2667, 0.3333, 0.4000, 0.4667, 0.4667, 0.5333, 0.6000, 0.6667, 0.6667, 0.7333, 0.8000, 0.8667, 0.8667, 0.9333, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000, 1.0000])