QuantSim v2

Welcome to QuantSim v2!

When designing QuantSim v2, we were motivated to provide users with a clean design from the ground-up while maintaining familiar APIs. With newly designed building blocks, users have more flexibility to customize and extend the components. Developers can experience more control and transparency throughout the quantization process.

In a future release, aimet_torch Quantization Simulation will go through a major redesign of the basic building blocks that make up a simulated quantization model, referred to as QuantSim v2. While these changes have not yet been mainlined into aimet_torch, they have been made optionally available in the experimental aimet_torch.v2 namespace.

Note

Please be advised that QuantSim v2 is an experimental feature whose APIs and behaviors are subject to change.

Overview

At a high level, QuantSim v2:

  • Comprises of a different set of building blocks, quantized nn.Modules and Quantizers

  • Enables dispatching to custom quantized kernels

  • Allows components to be extended easily to support advanced quantization techniques

  • Moves all implementation code to Python for easier debugging and portability

Like QuantSim v1, QuantSim v2 upholds the same high level API such as compute_encodings() and export(). Both QuantSim versions can perform fake quantization (quantization on floating point kernels) and support the same AIMET features like AdaRound, Sequential MSE, and QuantAnalyzer.

To learn more about the differences between QuantSim v1 and QuantSim v2 and how to migrate your code, please refer to this guide.

Using QuantSim v2

All code that involves QuantSim v2 can be found in the aimet_torch.v2 namespace. Please refer to following to navigate the namespace:

New Features

We have now enabled blockwise quantization and low power blockwise quantization for QuantSim v2 users. When applied, these features obtain encoding parameters with a finer granularity, which produces a more optimized quantization grid.

To learn more, please refer to the following documentation: