QuantizedTensorBase¶

class aimet_torch.quantization.QuantizedTensorBase(*args, **kwargs)[source]¶

Abstract base class for quantized tensors. Represents a quantized or dequantized tensor as a subclass of torch.Tensor which also holds the quantization encodings. This object can be safely quantized or dequantized through the quantize() and dequantize() methods without changing the represented data values.

Example

>>> from aimet_torch.v2 import quantization as Q
>>> quantizer = Q.affine.Quantize(shape=(2, 1), bitwidth=8, symmetric=True)
>>> x = torch.tensor([[-1.20, 4.1, -0.21, 2.3],
...                   [0.2, 5.6, -1.0, -.1]])
>>> with quantizer.compute_encodings():
...     x_q = quantizer(x)
>>> torch.equal(x_q.encoding.scale, quantizer.get_scale())
True
>>> x_q
QuantizedTensor([[-37., 127.,  -7.,  71.],
                 [  5., 127., -23.,  -2.]])
>>> x_q.quantized_repr()
tensor([[-37, 127,  -7,  71],
        [  5, 127, -23,  -2]], dtype=torch.int8)
>>> x_q.dequantize()
DequantizedTensor([[-1.1945,  4.1000, -0.2260,  2.2921],
                   [ 0.2205,  5.6000, -1.0142, -0.0882]])

clone(*, memory_format=torch.preserve_format)[source]¶

Returns a copy of self

Parameters:: memory_format – Desired memory format of the returned tensor (default=torch.preserve_format)

abstract dequantize()[source]¶: Dequantizes self with the associated encoding :rtype: DequantizedTensor

Note

This method must be an IDEMPOTENT function. The result of calling this method multiple times should be equal to calling it only once. In other words, calling this method multiple times should not result in duplicate dequantization.

detach()[source]¶

Returns a new QuantizedTensorBase with data and encoding detached from the current graph

Return type:: QuantizedTensorBase

new_empty(size, *, dtype=None, device=None, requires_grad=False, layout=torch.strided, pin_memory=False, **kwargs)[source]¶

Returns a Tensor of size size filled with uninitialized data. By default, the returned Tensor has the same torch.dtype and torch.device as this tensor.

Return type:

QuantizedTensorBase

Parameters:

size (int...) – a list, tuple, or torch.Size of integers defining the shape of the output tensor.

Keyword Arguments:

dtype (torch.dtype, optional) – the desired type of returned tensor. Default: if None, same torch.dtype as this tensor.
device (torch.device, optional) – the desired device of returned tensor. Default: if None, same torch.device as this tensor.
requires_grad (bool, optional) – If autograd should record operations on the returned tensor. Default: False.
layout (torch.layout, optional) – the desired layout of returned Tensor. Default: torch.strided.
pin_memory (bool, optional) – If set, returned tensor would be allocated in the pinned memory. Works only for CPU tensors. Default: False.

Example:

>>> tensor = torch.ones(())
>>> tensor.new_empty((2, 3))
tensor([[ 5.8182e-18,  4.5765e-41, -1.0545e+30],
        [ 3.0949e-41,  4.4842e-44,  0.0000e+00]])

abstract quantize()[source]¶: Quantizes self with the associated encoding :rtype: QuantizedTensor

Note

This method must be an IDEMPOTENT function. The result of calling this method multiple times should be equal to calling it only once. In other words, calling this method multiple times should not result in duplicate quantization.

abstract quantized_repr()[source]¶

Return the quantized representation of self as a torch.Tensor with data type self.encoding.dtype :rtype: Tensor

Note

The result of this function may not be able to carry a gradient depending on the quantized data type. Thus, it may be necessary to call this only within an autograd function to allow for backpropagation.

Example

>>> from aimet_torch.v2 import quantization as Q
>>> quantizer = Q.affine.Quantize(shape=(2, 1), bitwidth=8, symmetric=True)
>>> x = torch.randn((2, 4), requires_grad=True)
>>> with quantizer.compute_encodings():
...     x_q = quantizer(x)
>>> x_q
QuantizedTensor([[  11.,  -57., -128.,   38.],
                 [  28.,   -0., -128.,  -40.]], grad_fn=<AliasBackward0>)
>>> x_q.quantized_repr()
tensor([[  11,  -57, -128,   38],
        [  28,    0, -128,  -40]], dtype=torch.int8)