aimet_onnx.quantsim.set_lpbq_for_params¶
Top level APIs
- aimet_onnx.quantsim.set_lpbq_for_params(sim, bitwidth, block_size, *, op_types=None, nodes_to_exclude=None, nodes_to_include=None, strict=None)[source]¶
Set weight quantizers of specified nodes to use low-power blockwise quantization.
This function is overloaded with the following signatures:
- aimet_onnx.quantsim.set_lpbq_for_params(sim, bitwidth, block_size, *, nodes_to_include=None)[source]
- Parameters:
sim (QuantizationSimModel) – Quantsim to set weight quantizers for
bitwidth (int) – Compressed bitwidth for lpbq quantization
block_size (int) – Block size for affine quantization. The block size will be applied to the weight’s input features dimension, while per-channel will be used for the weight’s output features dimension
nodes_to_include (Set[str]) – Set of onnx node names to include for blockwise weight quantization.
- aimet_onnx.quantsim.set_lpbq_for_params(sim, bitwidth, block_size, *, op_types=None, nodes_to_exclude=None, strict=False)[source]
- Parameters:
sim (QuantizationSimModel) – Quantsim to set weight quantizers for
bitwidth (int) – Compressed bitwidth for lpbq quantization
block_size (int) – Block size for affine quantization. The block size will be applied to the weight’s input features dimension, while per-channel will be used for the weight’s output features dimension
op_types (Union[str, Set[str]]) – Operator types for which to enable grouped blockwise weight quantizaiton
nodes_to_exclude (Set[str]) – Set of onnx node names to exclude from blockwise weight quantization.
strict (bool) – If False, only enable blockwise quant for layers with dimensions evenly divisible by block_size. If True, throw an error for layers with incompatible shapes.
Examples
>>> sim = QuantizationSimModel(...) >>> set_lpbq_for_params(sim, bitwidth=4, block_size=64, op_types={"Gemm", "MatMul", "Conv"}) >>> # or >>> set_lpbq_for_params(sim, bitwidth=4, block_size=64, nodes_to_include={"/lm_head/MatMul", ...})