OmniQuant¶

Context¶

OmniQuant is a PTQ technique which improves the accuracy of the quantized model by computing optimal quantization parameters for weights. OmniQuant is based on : https://arxiv.org/abs/2308.13137. Omniquant comprises of 2 components Learnable Weight Clipping (LWC) and Learnable Equivalent Transformation (LET).

OmniQuant introduces trainable parameter scale in the weight quantizers of every supported module and performs BKD (Blockwise Knowledge Distillation) by comparing quantized output of every supported block with its FP32 equivalent. The trainable parameter scale is learnt pairwise in Omniquant. From OmniQuant perspective, a block is defined as a non-leaf module which takes in one activation input tensor and outputs one activation tensor. Omniquant also requires blocks to be contiguous to perform optimization. Warning: This feature is currently experimental. This feature is currently supported for llama3.2, Qwen2.5, Deepseek Distill for Qwen 2.5

API¶

PyTorch

Top level APIs

aimet_torch.experimental.omniquant.apply_omniquant(quant_sim, dataloader, forward_fn, num_iterations=800, output_path='./aimet_omniquant_artifact/')¶

Returns model with with omniquant weight, and save metadata in safetensor format to output path. Metadata safetensor can be used in update_lora_weights to update lora adaptor weights for peft lora model.

Parameters:

quant_sim (QuantizationSimModel) – QuantizationSimModel object to optimize with Omniquant.
dataloader – Dataloader used to train model.
forward_fn (Callable) – Model forward function used to cache intermediate data. Expect to have model and inputs as function argument. e.g. lambda model, inputs: model(*inputs)
num_iterations (int) – Number of iterations to train each block with omniquant.
output_path (str) – Path to save {layer_name: scale} metadata safetensor.

Returns:

Model with Omniquant weights.

TensorFlow

Not supported.

ONNX

Not supported.

OmniQuant¶

Context¶

Workflow¶

Prerequisites¶

Procedure¶

Setup¶

Step 1¶

Step 2¶

Step 3¶

Step 4¶

Step 5¶

API¶