Techniques¶

This section gives brief overview of quantization techniques and how to apply them with AIMET.

Post Training Quantization ¶

Quantize model with given calibration data in user specified parameter and activation precision (bit-width).

Quantization Aware Training ¶

Train model with quantization awareness to minimize quantization noise.

Blockwise Quantization ¶

Quantize individual tensor with block size to balance accuracy and speed.

Mixed Precision ¶

Configure per-layer bit-widths to optimize accuracy and performance.

Analysis tools ¶

Analysis tools to automatically identify sensitive areas and hotspots in your pre-trained model.

Compression ¶

Reduces pre-trained model’s Multiply-accumulate(MAC) and memory costs with a minimal drop in accuracy. AIMET supports various compression techniques like Weight SVD, Spatial SVD and Channel pruning.

Techniques¶

Post Training Quantization¶

Quantization Aware Training¶

Blockwise Quantization¶

Mixed Precision¶

Analysis tools¶

Compression¶