aimet_onnx.quantsim.set_lpbq_for_params¶

Top level APIs

aimet_onnx.quantsim.set_lpbq_for_params(sim, bitwidth, block_size, *, op_types=None, nodes_to_exclude=None, nodes_to_include=None, strict=None)[source]¶

Set weight quantizers of specified nodes to use low-power blockwise quantization.

This function is overloaded with the following signatures:

aimet_onnx.quantsim.set_lpbq_for_params(sim, bitwidth, block_size, *, nodes_to_include=None)[source]

Parameters:

sim (QuantizationSimModel) – Quantsim to set weight quantizers for
bitwidth (int) – Compressed bitwidth for lpbq quantization
block_size (int) – Block size for affine quantization. The block size will be applied to the weight’s input features dimension, while per-channel will be used for the weight’s output features dimension
nodes_to_include (Set[str]) – Set of onnx node names to include for blockwise weight quantization.

aimet_onnx.quantsim.set_lpbq_for_params(sim, bitwidth, block_size, *, op_types=None, nodes_to_exclude=None, strict=False)[source]

Parameters:

sim (QuantizationSimModel) – Quantsim to set weight quantizers for
bitwidth (int) – Compressed bitwidth for lpbq quantization
block_size (int) – Block size for affine quantization. The block size will be applied to the weight’s input features dimension, while per-channel will be used for the weight’s output features dimension
op_types (Union[str, Set[str]]) – Operator types for which to enable grouped blockwise weight quantizaiton
nodes_to_exclude (Set[str]) – Set of onnx node names to exclude from blockwise weight quantization.
strict (bool) – If False, only enable blockwise quant for layers with dimensions evenly divisible by block_size. If True, throw an error for layers with incompatible shapes.

Examples

>>> sim = QuantizationSimModel(...)
>>> set_lpbq_for_params(sim, bitwidth=4, block_size=64, op_types={"Gemm", "MatMul", "Conv"})
>>> # or
>>> set_lpbq_for_params(sim, bitwidth=4, block_size=64, nodes_to_include={"/lm_head/MatMul", ...})