QuantizedTensorBase¶
- class aimet_torch.quantization.QuantizedTensorBase(*args, **kwargs)[source]¶
 Abstract base class for quantized tensors. Represents a quantized or dequantized tensor as a subclass of
torch.Tensorwhich also holds the quantization encodings. This object can be safely quantized or dequantized through thequantize()anddequantize()methods without changing the represented data values.Example
>>> from aimet_torch.v2 import quantization as Q >>> quantizer = Q.affine.Quantize(shape=(2, 1), bitwidth=8, symmetric=True) >>> x = torch.tensor([[-1.20, 4.1, -0.21, 2.3], ... [0.2, 5.6, -1.0, -.1]]) >>> with quantizer.compute_encodings(): ... x_q = quantizer(x) >>> torch.equal(x_q.encoding.scale, quantizer.get_scale()) True >>> x_q QuantizedTensor([[-37., 127., -7., 71.], [ 5., 127., -23., -2.]]) >>> x_q.quantized_repr() tensor([[-37, 127, -7, 71], [ 5, 127, -23, -2]], dtype=torch.int8) >>> x_q.dequantize() DequantizedTensor([[-1.1945, 4.1000, -0.2260, 2.2921], [ 0.2205, 5.6000, -1.0142, -0.0882]])
- clone(*, memory_format=torch.preserve_format)[source]¶
 Returns a copy of self
- Parameters:
 memory_format – Desired memory format of the returned tensor (default=torch.preserve_format)
- abstract dequantize()[source]¶
 Dequantizes
selfwith the associated encoding :rtype:DequantizedTensorNote
This method must be an IDEMPOTENT function. The result of calling this method multiple times should be equal to calling it only once. In other words, calling this method multiple times should not result in duplicate dequantization.
- detach()[source]¶
 Returns a new QuantizedTensorBase with data and encoding detached from the current graph
- Return type:
 
- new_empty(size, *, dtype=None, device=None, requires_grad=False, layout=torch.strided, pin_memory=False, **kwargs)[source]¶
 Overrides torch.Tensor.new_empty
- Return type:
 
- abstract quantize()[source]¶
 Quantizes
selfwith the associated encoding :rtype:QuantizedTensorNote
This method must be an IDEMPOTENT function. The result of calling this method multiple times should be equal to calling it only once. In other words, calling this method multiple times should not result in duplicate quantization.
- abstract quantized_repr()[source]¶
 Return the quantized representation of
selfas atorch.Tensorwith data typeself.encoding.dtype:rtype:TensorNote
The result of this function may not be able to carry a gradient depending on the quantized data type. Thus, it may be necessary to call this only within an autograd function to allow for backpropagation.
Example
>>> from aimet_torch.v2 import quantization as Q >>> quantizer = Q.affine.Quantize(shape=(2, 1), bitwidth=8, symmetric=True) >>> x = torch.randn((2, 4), requires_grad=True) >>> with quantizer.compute_encodings(): ... x_q = quantizer(x) >>> x_q QuantizedTensor([[ 11., -57., -128., 38.], [ 28., -0., -128., -40.]], grad_fn=<AliasBackward0>) >>> x_q.quantized_repr() tensor([[ 11, -57, -128, 38], [ 28, 0, -128, -40]], dtype=torch.int8)