Warning
This feature is under heavy development and API changes may occur without notice in future verions.
QuantizationMixin
- class aimet_torch.v2.nn.QuantizationMixin(*args, **kwargs)[source]
Mixin that adds quantization functionality on top of regular pytorch modules.
QuantizationMixin
provides all the same behavior asFakeQuantizationMixin
, and by default, a quantized module behaves exactly the same as a fake-quantized version of the sametorch.nn.Module
. On top of this functionality,QuantizationMixin
provides the ability to set custom quantized kernels which will be called in place of the floating-point pytorch operation in the forward pass.- input_quantizers
ModuleList
containingQuantizerBase
objects to be applied to the layer’s input tensors- Type:
nn.ModuleList
- output_quantizers
ModuleList
containingQuantizerBase
objects to be applied to the layer’s output tensors- Type:
nn.ModuleList
- param_quantizers
ModuleDict
mapping parameter names to associatedQuantizerBase
objects- Type:
nn.ModuleDict
Examples
>>> qlinear = QuantizedLinear(in_features=10, out_features=10, bias=False) >>> print(qlinear) QuantizedLinear( in_features=10, out_features=10, bias=False (param_quantizers): ModuleDict( (weight): None ) (input_quantizers): ModuleList( (0): None ) (output_quantizers): ModuleList( (0): None ) )
>>> linear = torch.nn.Linear(in_features=10, out_features=20, bias=True) >>> qlinear = QuantizationMixin.from_module(linear) >>> print(qlinear) QuantizedLinear( in_features=10, out_features=20, bias=True (param_quantizers): ModuleDict( (weight): None (bias): None ) (input_quantizers): ModuleList( (0): None ) (output_quantizers): ModuleList( (0): None ) ) >>> qlinear.weight is linear.weight True
- abstract forward(*args, **kwargs)[source]
Computes a quantized version of the parent module’s forward method.
If no custom kernel has been set for the layer or the layer is called within its compute_encodings context, this will fall back to the fake-quantized forward pass used in the equivalent
FakeQuantizationMixin
module.If a custom kernel implementation is available for the layer (i.e.,
get_kernel()
does not returnNone
), this method will perform the following logic:Apply existing input quantizers to input tensors
Apply existing parameter quantizers to the layer’s parameters
Call into the kernel retrieved by
get_kernel()
, passing the quantized inputs and parameters as well as the output encodings fromoutput_quantizers
Dequantize the output of the kernel call
- __quant_init__()
Initializer for quantized module. This method will be invoked right after
__init__()
.This method initializes the
input_quantizers
,output_quantizers
, andparam_quantizers
structures to the appropriate sizes based on the number of input tensors, output tensors, and parameters of the basenn.Module
class. All quantizers are initializd toNone
.For custom quantized classes, this method should be overridden to set the appropriate lengths of
input_quantizers
andoutput_quantizers
for the given base class.
- set_kernel(kernel)[source]
Set kernel for this instance of quantized module.
The function signature of this kernel must match the signature used in the
forward()
method. In general, this signature will follow the signature of the equivalenttorch.nn.functional
function, but should return aQuantizedTensor
object and take in the additional keyword argumentoutput_encodings
.Once set, the layer will call into
kernel
in the forward pass unless within thecompute_encodings()
context.- Parameters:
kernel – Callable object to be used as the underlying kernel.
Example
>>> from aimet_torch.v2 import quantization as Q >>> def int_multiply(a, b, output_encodings=None): ... encodings = [a.encoding, b.encoding, output_encodings] ... if not all(enc.mapping == "affine" for enc in encodings): ... raise NotImplementedError ... q_output = (a.quantized_repr() + a.encoding.offset) * (b.quantized_repr() + b.encoding.offset) ... dq_output = q_output * (a.encoding.scale * b.encoding.scale) ... return Q.QuantizedTensor(output_encodings.quantize(dq_output), encoding=output_encodings) ... >>> qmult = QuantizedMultiply() >>> qmult.set_kernel(int_multiply)
- classmethod set_default_kernel(kernel)[source]
Set default kernel for the class.
The function signature of this kernel must match the signature used in the
quantized_forward()
method. In general, this signature will follow the signature of the equivalenttorch.nn.functional
function, but should return aQuantizedTensor
object and take in the additional keyword argumentoutput_encodings
.Once set, all instances of cls will call into kernel in the forward pass unless:
The instance is within the
compute_encodings()
context, orThe kernel has been overridden by a
set_kernel()
call
- Parameters:
kernel – Callable object to be used as the default kernel by all the instances of this class.
Example
>>> from aimet_torch.v2 import quantization as Q >>> def int_multiply(a, b, output_encodings=None): ... encodings = [a.encoding, b.encoding, output_encodings] ... if not all(enc.mapping == "affine" for enc in encodings): ... raise NotImplementedError ... q_output = (a.quantized_repr() + a.encoding.offset) * (b.quantized_repr() + b.encoding.offset) ... dq_output = q_output * (a.encoding.scale * b.encoding.scale) ... return Q.QuantizedTensor(output_encodings.quantize(dq_output), encoding=output_encodings) ... >>> QuantizedMultiply.set_default_kernel(int_multiply) >>> qmult = QuantizedMultiply() >>> qmult.get_kernel() <function int_multiply at ...>
- compute_encodings()[source]
Enters the
compute_encodings()
context for allQuantizerBase
objects in the layer.Inside this context, each quantizer will observe all inputs passed to the quantizer and will compute quantization encodings upon exiting the context.
Example
>>> qlinear = QuantizedLinear(10, 10) >>> qlinear.output_quantizers[0] = Quantize((), 8, symmetric=False) >>> with qlinear.compute_encodings(): >>> qlinear(torch.randn(16, 10)) >>> print(qlinear.output_quantizers[0].is_initialized()) True
- classmethod from_module(module)
Create an instance of quantized module from a regular module instance.
The resulting quantized module contains the same attributes and parameters as the original module, but may be assigned input, output and parameter quantizers.
- Parameters:
module (
Module
) – Floating point module to quantize- Returns:
Quantized version of the original module
Example
>>> linear = torch.nn.linear(10, 10) >>> quantized_linear = FakeQuantizationMixin.from_module(linear) >>> print(quantized_linear.weight is linear.weight) True >>> print(quantized_linear.param_quantizers) ModuleDict( (weight): None (bias): None )
- classmethod get_default_kernel()[source]
Return the default kernel of the class
- Return type:
Optional
[Callable
]- Returns:
Default kernel of the class. None if the default kernel is not set.