Migrate to QuantSim v2
Learn how to migrate your code from QuantSim v1 to QuantSim v2!
Migration to QuantSim v2 enables access to new features, easier debugging, and simpler code that is easier to extend. This guide provides an overview of the migration process and describes the fundamental differences between the two versions.
Note
Please be advised that QuantSim v2 is an experimental feature whose APIs and behaviors are subject to change.
Changes in QuantSim v2
Before migrating, it is important to understand the behavior and API differences between QuantSim v1 and QuantSim v2. Users can interact with QuantSim through the high level APIs in the same way. Methods like compute_encodings()
and export()
will remain the same.
Under the hood, QuantSim v2 has a different set of building blocks and properties than QuantSim v1, as shown below:
Migration Process
The migration process includes the following:
Update imports of QuantizationSimModel and other features
Change how internal components of QuantizationSimModel are accessed
Remove any dependency on deprecated features
Imports
To migrate to QuantSim v2, your imports should originate from the aimet_torch.v2
namespace and be replaced as shown below. If your code does not directly access lower-level components, no further code change is needed.
AIMET Classes |
|
|
---|---|---|
QuantSim |
|
|
AdaRound |
|
|
Sequential MSE |
|
|
QuantAnalyzer |
|
|
AutoQuant |
|
In QuantSim v2, all implementation code is ported to Python. Users will no longer need to import from aimet_common.libpymo
. Please refer to the table in Deprecated Features to migrate these imports.
All the other import statements will stay the same, including but not limited to:
from aimet_common.defs import QuantScheme
from aimet_torch.cross_layer_equalization import equalize_model
from aimet_torch.model_preparer import prepare_model
QuantizationSimModel
Moving from QuantWrapper to Quantized Modules
To enable quantization in QuantSim v1, modules are wrapped with a QuantizeWrapper. These wrapped modules can be accessed as follows:
from aimet_torch.quantsim import QuantizationSimModel as QuantizationSimModelV1
sim = QuantizationSimModelV1(...)
all_quant_wrappers = sim.quant_wrappers()
for quant_wrapper in sim.quant_wrappers():
print(quant_wrapper)
StaticGridQuantWrapper(
(_module_to_wrap): Linear(in_features=100, out_features=200, bias=True)
)
StaticGridQuantWrapper(
(_module_to_wrap): ReLU()
)
In contrast, QuantSim v2 enables quantization through quantized nn.Modules - modules are no longer wrapped but replaced with a quantized version. For example, a nn.Linear would be replaced with QuantizedLinear, nn.Conv2d would be replace by QuantizedConv2d, and so on. The quantized module definitions can be found under aimet_torch.v2.nn
. These quantized modules can be accessed as follows:
from aimet_torch.v2.quantsim.quantsim import QuantizationSimModel as QuantizationSimModelV2
sim2 = QuantizationSimModelV2(...)
all_q_modules = sim2.qmodules()
for q_module in sim2.qmodules():
print(q_module)
QuantizedLinear(
in_features=100, out_features=200, bias=True
(param_quantizers): ModuleDict(
(weight): QuantizeDequantize(shape=[1], bitwidth=8, symmetric=True)
(bias): None
)
(input_quantizers): ModuleList(
(0): QuantizeDequantize(shape=[1], bitwidth=8, symmetric=False)
)
(output_quantizers): ModuleList(
(0): None
)
)
FakeQuantizedReLU(
(param_quantizers): ModuleDict()
(input_quantizers): ModuleList(
(0): None
)
(output_quantizers): ModuleList(
(0): QuantizeDequantize(shape=[1], bitwidth=8, symmetric=False)
)
)
For more information on Quantized modules, please refer to the API reference guide here.
Moving from StaticGrid and LearnedGrid Quantizer to Affine and Float Quantizer
In QuantSim v1, we relied on StaticGridQuantizer and LearnedGridQuantizer. For both, floating point quantization could be enabled based on QuantizationDataType
passed in.
from aimet_torch.tensor_quantizer import StaticGridPerChannelQuantizers
from aimet_common.defs import QuantizationDataType
fp_quantizer = StaticGridPerChannelQuantizer(data_type = QuantizationDataType.float, ...)
affine_quantizer = StaticGridPerChannelQuantizer(data_type = QuantizationDataType.int, ...)
However, in QuantSim v2, this functionality is separated into an AffineQuantizer and a FloatQuantizer. Users can access these quantizers and related operations under aimet_torch.v2.quantization.
import aimet_torch.v2.quantization as Q
affine_q = Q.affine.Quantize(shape=(5, 1), bitwidth=8, symmetric=False)
affine_qdq = Q.affine.QuantizeDequantize(shape=(5, 1), bitwidth=8, symmetric=False)
fp_qdq = Q.float.FloatQuantizeDequantize(dtype=torch.float16)
From the wrapped module (QuantSim v1) or quantized module (QuantSim v2), the attributes to access the quantizers remain consistent: .input_quantizers
for input quantizers, .output_quantizers
for output quantizers, and .param_quantizers
for parameter quantizers.
For more information on Quantizers, please refer to the API reference guide here.
Code Examples
Setup
# QuantSim v1
from aimet_torch.quantsim import QuantizationSimModel as QuantizationSimModelV1
sim1 = QuantizationSimModelV1(...)
wrap_linear = sim1.model.linear
# QuantSim v2
from aimet_torch.v2.quantsim.quantsim import QuantizationSimModel as QuantizationSimModelV2
sim2 = QuantizationSimModelV2(...)
qlinear = sim2.model.linear
Case 1: Manually setting common attributes
Bitwidth
# QuantSim v1
wrap_linear.param_quantizers['weight'].bitwidth = 4
wrap_linear.input_quantizers[0].bitwidth = 4
wrap_linear.output_quantizers[0].bitwidth = 4
# QuantSim v2
if qlinear.param_quantizers['weight']:
module.param_quantizers['weight'].bitwidth = 4
if qlinear.input_quantizers[0]:
qlinear.input_quantizers[0].bitwidth = 4
if qlinear.output_quantizers[0]:
qlinear.output_quantizers[0].bitwidth = 4
Symmetry
# QuantSim v1
wrap_linear.param_quantizers['weight'].use_symmetric_encodings = True
wrap_linear.param_quantizers['weight'].is_unsigned_symmetric = False
wrap_linear.param_quantizers['weight'].use_strict_symmetric = False
wrap_linear.input_quantizers[0].use_symmetric_encodings = True
wrap_linear.input_quantizers[0].is_unsigned_symmetric = False
wrap_linear.input_quantizers[0].use_strict_symmetric = False
wrap_linear.output_quantizers[0].use_symmetric_encodings = True
wrap_linear.output_quantizers[0].is_unsigned_symmetric = False
wrap_linear.output_quantizers[0].use_strict_symmetric = False
# QuantSim v2
# Notes: simplified into two flags
if qlinear.param_quantizers['weight']:
qlinear.param_quantizers['weight'].symmetric = True
qlinear.param_quantizers['weight'].signed = True
if qlinear.input_quantizers[0]:
qlinear.input_quantizers[0].symmetric = True
qlinear.input_quantizers[0].signed = True
if qlinear.output_quantizers[0]:
qlinear.output_quantizers[0].symmetric = True
qlinear.output_quantizers[0].signed = True
Encoding Data
# QuantSim v1
import libpymo
if wrap_linear.param_quantizers['weight'].encoding:
encoding = libpymo.TfEncoding()
encoding.max = 1
encoding.min = -1
wrap_linear.param_quantizers['weight'].encoding = encoding
if wrap_linear.input_quantizers[0].encoding:
encoding = libpymo.TfEncoding()
encoding.max = 1
encoding.min = -1
wrap_linear.input_quantizers[0].encoding = encoding
if wrap_linear.output_quantizers[0].encoding:
encoding = libpymo.TfEncoding()
encoding.max = 1
encoding.min = -1
wrap_linear.output_quantizers[0].encoding = encoding
# QuantSim v2
# Notes: TfEncoding() is no longer used, encoding min/max are of type torch.nn.Parameter
if qlinear.param_quantizers['weight']:
qlinear.param_quantizers['weight'].min.copy_(-1.0)
module.param_quantizers['weight'].max.copy_(1.0)
if qlinear.input_quantizers[0]:
qlinear.input_quantizers[0].min.copy_(-1.0)
qlinear.input_quantizers[0].max.copy_(1.0)
if qlinear.output_quantizers[0]:
qlinear.output_quantizers[0].min.copy_(-1.0)
qlinear.output_quantizers[0].max.copy_(1.0)
Case 2: Enabling and Disabling Quantization
Is quantization enabled?
# QuantSim v1
if wrap_linear.param_quantizers['weight'].enabled:
pass
# QuantSim v2
# Notes: Quantizers no longer have an 'enabled' attribute. If a quantizer is present, it is enabled
if qlinear.param_quantizers['weight']:
pass
Disabling Quantization
# QuantSim v1
wrap_linear.param_quantizers['weight'].enabled = False
# QuantSim v2
# Notes: Quantizers can be disabled by setting them to None OR using the utility API (_remove_input_quantizers, _remove_output_quantizers, _remove_param_quantizers)
qlinear.param_encodings["weight"] = None
qlinear._remove_param_quantizers('weight')
Enabling Quantization
# QuantSim v1
wrap_linear.param_quantizers['weight'].enabled = True
# QuantSim v2
import aimet_torch.v2.quantization as Q
qlinear.param_quantizers['weight'] = Q.affine.QuantizeDequantize(...)
Temporarily disabling Quantization
# QuantSim v1
assert wrap_linear.param_quantizers['weight'].enabled
wrap_linear.param_quantizers['weight'].enabled = False
# Run other code here
wrap_linear.param_quantizers['weight'].enabled = True
# QuantSim v2
assert qlinear.param_quantizers['weight']
with qlinear._remove_param_quantizers('weight'):
assert qlinear.param_quantizers['weight'] is None
# Run other code here
assert qlinear.param_quantizers['weight']
Case 3: Freezing encodings
# QuantSim v1
if not wrap_linear.param_quantizers['weight']._is_encoding_frozen:
wrap_linear.param_quantizers['weight'].freeze_encodings()
# QuantSim v2
# Notes: There is no longer a concept of "freezing". Mimicking v1 freezing behavior involves the allow_overwrite and requires_grad_ flag
qlinear.param_quantizers['weight'].allow_overwrite(False) # Prevents encodings from being overwritten by AIMET APIs
qlinear.param_quantizers['weight'].requires_grad_(False) # Prevents encodings from being learned
Deprecated Features
There are some components that are tied to the QuantSim v1 design that are not needed in QuantSim v2. For example, all QuantSim v2 source code will be implemented in Python to provide easier debugging and improved portability. It is not recommended to use libpymo modules with QuantSim 2.0. Below, you can see a list of these features and the recommended migration guideline:
Component in v1 |
Replacement in v2 |
---|---|
StaticGridQuantWrapper, LearnedGridQuantWrapper |
Quantized nn.Modules |
StaticGridQuantizer, LearnedGridQuantizer |
|
libpymo.EncodingAnalyzerForPython |
MinMaxEncodingAnalyzer, SqnrEncodingAnalyzer, PercentileEncodingAnalyzer |
libpymo.TfEncoding |
AffineEncoding, FloatEncoding, VectorEncoding |