AI Model Efficiency Toolkit Logo
1.34.0
  • Quantization User Guide
    • Use Cases
    • AIMET Quantization Features
      • Quantization Simulation
        • Overview
        • QuantSim Workflow
        • Simulating Quantization Noise
        • Determining Quantization Parameters (Encodings)
        • Quantization Schemes
        • Configuring Quantization Simulation Ops
        • Quantization Simulation APIs
        • Frequently Asked Questions
      • Quantization-Aware Training (QAT)
        • Overview
        • QAT workflow
        • QAT modes
        • Recommendations for Quantization-Aware Training
      • Post-Training Quantization
        • AutoQuant
          • Overview
          • Workflow
          • AutoQuant API
        • Adaptive Rounding (AdaRound)
          • AdaRound Use Cases
          • Common terminology
          • Use Cases
          • AdaRound API
        • Cross-Layer Equalization
          • Overview
          • User Flow
          • Cross-Layer Equalization API
          • FAQs
          • References
        • BN Re-estimation
          • Overview
          • Workflow
          • BN Re-estimation API
        • Bias Correction [Depricated]
          • Overview
          • User Flow
          • Cross-Layer Equalization API
          • FAQs
          • References
      • Debugging/Analysis Tools
        • QuantAnalyzer
          • Overview
          • Requirements
          • Detailed Analysis Descriptions
          • QuantAnalyzer API
        • Visualizations
          • Overview
          • Quantization
            • PyTorch
            • TensorFlow
    • AIMET Quantization Workflow
      • PyTorch
        • PyTorch Model Guidelines
        • AIMET PyTorch Quantization APIs
          • aimet_torch
            • API Reference
              • Model Guidelines
              • Architecture Checker API
                • check_model_arch()
              • Model Preparer API
                • Top-level API
                  • prepare_model()
                • Code Examples
                • Limitations of torch.fx symbolic trace API
              • Model Validator API
              • Quant Analyzer API
                • User Guide Link
                • Examples Notebook Link
                • Top-level API
                  • QuantAnalyzer
                    • QuantAnalyzer.enable_per_layer_mse_loss()
                    • QuantAnalyzer.analyze()
                  • CallbackFunc
                • Run specific utility
                  • QuantAnalyzer.check_model_sensitivity_to_quantization()
                  • QuantAnalyzer.perform_per_layer_analysis_by_enabling_quant_wrappers()
                  • QuantAnalyzer.perform_per_layer_analysis_by_disabling_quant_wrappers()
                  • QuantAnalyzer.export_per_layer_encoding_min_max_range()
                  • QuantAnalyzer.export_per_layer_stats_histogram()
                  • QuantAnalyzer.export_per_layer_mse_loss()
                • Code Examples
              • Quantization Simulation API
                • User Guide Link
                • Examples Notebook Link
                • Guidelines
                • Top-level API
                  • QuantizationSimModel
                    • QuantizationSimModel.compute_encodings()
                    • QuantizationSimModel.export()
                  • quantsim.save_checkpoint()
                  • quantsim.load_checkpoint()
                • Enum Definition
                  • QuantScheme
                    • QuantScheme.post_training_percentile
                    • QuantScheme.post_training_tf
                    • QuantScheme.post_training_tf_enhanced
                    • QuantScheme.training_range_learning_with_tf_enhanced_init
                    • QuantScheme.training_range_learning_with_tf_init
                • Code Example - Quantization Aware Training (QAT)
              • Adaptive Rounding API
                • User Guide Link
                • Examples Notebook Link
                • Top-level API
                  • apply_adaround()
                • Adaround Parameters
                  • AdaroundParameters
                • Enum Definition
                  • QuantScheme
                    • QuantScheme.post_training_percentile
                    • QuantScheme.post_training_tf
                    • QuantScheme.post_training_tf_enhanced
                    • QuantScheme.training_range_learning_with_tf_enhanced_init
                    • QuantScheme.training_range_learning_with_tf_init
                • Code Example - Adaptive Rounding (AdaRound)
              • Cross-Layer Equalization API
                • User Guide Link
                • Examples Notebook Link
                • Introduction
                • Cross Layer Equalization API
                  • equalize_model()
                • Code Example
                • Primitive APIs
                  • Primitive APIs for Cross Layer Equalization
                    • Introduction
                    • ClsSetInfo Definition
                    • Higher Level APIs for Cross Layer Equalization
                    • Code Examples for Higher Level APIs
                    • Lower Level APIs for Cross Layer Equalization
                    • Code Examples for Lower Level APIs
              • Bias Correction API
                • User Guide Link
                • Bias Correction API
                  • correct_bias()
                • ConvBnInfoType
                  • ConvBnInfoType
                • ActivationType
                  • ActivationType
                    • ActivationType.no_activation
                    • ActivationType.relu
                    • ActivationType.relu6
                • Quantization Params
                  • QuantParams
                • Code Example #1 Empirical Bias Correction
                • Code Example #2 Analytical + Empirical Bias correction
              • AutoQuant API
                • User Guide Link
                • Examples Notebook Link
                • Top-level API
                  • AutoQuant
                • Code Examples
              • BN Re-estimation APIs
                • Examples Notebook Link
                • Introduction
                • Top-level APIs
                  • reestimate_bn_stats()
                  • fold_all_batch_norms_to_scale()
                • Code Example - BN-Reestimation
              • Multi-GPU guidelines
              • PEFT LoRA APIs
                • User flow
                • Top-level API
                  • AdapterMetaData
                  • peft.replace_lora_layers_with_quantizable_layers()
                  • peft.track_lora_meta_data()
                  • PeftQuantUtils
                    • PeftQuantUtils.disable_lora_adapters()
                    • PeftQuantUtils.enable_adapter_and_load_weights()
                    • PeftQuantUtils.export_adapter_weights()
                    • PeftQuantUtils.freeze_base_model()
                    • PeftQuantUtils.freeze_base_model_activation_quantizers()
                    • PeftQuantUtils.freeze_base_model_param_quantizers()
                    • PeftQuantUtils.get_fp_lora_layer()
                    • PeftQuantUtils.get_quantized_lora_layer()
                    • PeftQuantUtils.quantize_lora_scale_with_fixed_range()
                    • PeftQuantUtils.set_bitwidth_for_lora_adapters()
          • aimet_torch.v2
            • What’s New
            • Backwards Compatibility
            • API Reference
              • Quantized Modules
                • Top-level API
                  • BaseQuantizationMixin
                    • BaseQuantizationMixin.input_quantizers
                    • BaseQuantizationMixin.output_quantizers
                    • BaseQuantizationMixin.param_quantizers
                    • BaseQuantizationMixin.__quant_init__()
                    • BaseQuantizationMixin.compute_encodings()
                    • BaseQuantizationMixin.forward()
                • Configuration
                • Computing Encodings
                • Quantized Module Classes
              • Quantizers
                • Top-level API
                  • QuantizerBase
                    • QuantizerBase.allow_overwrite()
                    • QuantizerBase.compute_encodings()
                    • QuantizerBase.get_encoding()
                    • QuantizerBase.get_legacy_encodings()
                    • QuantizerBase.is_initialized()
                    • QuantizerBase.register_quantization_parameter()
                    • QuantizerBase.set_legacy_encodings()
                  • QuantizeDequantize
                    • QuantizeDequantize.forward()
                  • Quantize
                    • Quantize.forward()
              • Encoding Analyzers
                • EncodingAnalyzer
                • Variants
                  • MinMaxEncodingAnalyzer
                  • SqnrEncodingAnalyzer
                  • PercentileEncodingAnalyzer
              • FakeQuantizationMixin
                • FakeQuantizationMixin
                  • FakeQuantizationMixin.input_quantizers
                  • FakeQuantizationMixin.output_quantizers
                  • FakeQuantizationMixin.param_quantizers
                  • FakeQuantizationMixin.forward()
                  • FakeQuantizationMixin.__quant_init__()
                  • FakeQuantizationMixin.compute_encodings()
                  • FakeQuantizationMixin.from_module()
                  • FakeQuantizationMixin.implements()
              • QuantizationMixin
                • QuantizationMixin
                  • QuantizationMixin.input_quantizers
                  • QuantizationMixin.output_quantizers
                  • QuantizationMixin.param_quantizers
                  • QuantizationMixin.forward()
                  • QuantizationMixin.__quant_init__()
                  • QuantizationMixin.set_kernel()
                  • QuantizationMixin.set_default_kernel()
                  • QuantizationMixin.compute_encodings()
                  • QuantizationMixin.from_module()
                  • QuantizationMixin.get_default_kernel()
                  • QuantizationMixin.get_kernel()
                  • QuantizationMixin.implements()
              • quantization.affine
                • Classes
                  • Quantize
                  • QuantizeDequantize
                • Functions
                  • quantize()
                  • quantize_dequantize()
                  • dequantize()
              • quantization.float
                • Classes
                  • FloatQuantizeDequantize
                  • QuantizeDequantize
              • Visualization Tools
                • visualize_stats()
    • Debugging Guidelines
      • Quantization Guidebook
  • Compression User Guide
    • Overview
      • Compression Guidebook
    • Use Case
    • Compression ratio selection
      • Greedy Compression Ratio Selection
        • Overview
        • How it works
        • Per-layer Exploration
        • Compression Ratio Selection
      • Visualization
        • Overview
        • Design
        • Compression
        • Starting a Bokeh Server Session:
        • How to use the tool
    • Model Compression
      • Weight SVD
      • Spatial SVD
      • Channel Pruning
        • Overall Procedure
        • Channel Selection
        • Winnowing
          • Winnowing
            • Overview
            • Winnowing Overview
            • How Winnowing Works
        • Weight Reconstruction
    • Optional techniques to get better compression results
      • Rank Rounding
      • Per-layer Fine-tuning
    • FAQs
    • References
  • API Documentation
    • AIMET APIs for PyTorch
      • PyTorch Model Quantization API
        • aimet_torch
          • API Reference
            • Model Guidelines
            • Architecture Checker API
              • check_model_arch()
            • Model Preparer API
              • Top-level API
                • prepare_model()
              • Code Examples
              • Limitations of torch.fx symbolic trace API
            • Model Validator API
            • Quant Analyzer API
              • User Guide Link
              • Examples Notebook Link
              • Top-level API
                • QuantAnalyzer
                  • QuantAnalyzer.enable_per_layer_mse_loss()
                  • QuantAnalyzer.analyze()
                • CallbackFunc
              • Run specific utility
                • QuantAnalyzer.check_model_sensitivity_to_quantization()
                • QuantAnalyzer.perform_per_layer_analysis_by_enabling_quant_wrappers()
                • QuantAnalyzer.perform_per_layer_analysis_by_disabling_quant_wrappers()
                • QuantAnalyzer.export_per_layer_encoding_min_max_range()
                • QuantAnalyzer.export_per_layer_stats_histogram()
                • QuantAnalyzer.export_per_layer_mse_loss()
              • Code Examples
            • Quantization Simulation API
              • User Guide Link
              • Examples Notebook Link
              • Guidelines
              • Top-level API
                • QuantizationSimModel
                  • QuantizationSimModel.compute_encodings()
                  • QuantizationSimModel.export()
                • quantsim.save_checkpoint()
                • quantsim.load_checkpoint()
              • Enum Definition
                • QuantScheme
                  • QuantScheme.post_training_percentile
                  • QuantScheme.post_training_tf
                  • QuantScheme.post_training_tf_enhanced
                  • QuantScheme.training_range_learning_with_tf_enhanced_init
                  • QuantScheme.training_range_learning_with_tf_init
              • Code Example - Quantization Aware Training (QAT)
            • Adaptive Rounding API
              • User Guide Link
              • Examples Notebook Link
              • Top-level API
                • apply_adaround()
              • Adaround Parameters
                • AdaroundParameters
              • Enum Definition
                • QuantScheme
                  • QuantScheme.post_training_percentile
                  • QuantScheme.post_training_tf
                  • QuantScheme.post_training_tf_enhanced
                  • QuantScheme.training_range_learning_with_tf_enhanced_init
                  • QuantScheme.training_range_learning_with_tf_init
              • Code Example - Adaptive Rounding (AdaRound)
            • Cross-Layer Equalization API
              • User Guide Link
              • Examples Notebook Link
              • Introduction
              • Cross Layer Equalization API
                • equalize_model()
              • Code Example
              • Primitive APIs
                • Primitive APIs for Cross Layer Equalization
                  • Introduction
                  • ClsSetInfo Definition
                    • ClsSetInfo
                  • Higher Level APIs for Cross Layer Equalization
                    • fold_all_batch_norms()
                    • scale_model()
                    • bias_fold()
                  • Code Examples for Higher Level APIs
                  • Lower Level APIs for Cross Layer Equalization
                    • fold_given_batch_norms()
                    • scale_cls_sets()
                    • bias_fold()
                  • Code Examples for Lower Level APIs
            • Bias Correction API
              • User Guide Link
              • Bias Correction API
                • correct_bias()
              • ConvBnInfoType
                • ConvBnInfoType
              • ActivationType
                • ActivationType
                  • ActivationType.no_activation
                  • ActivationType.relu
                  • ActivationType.relu6
              • Quantization Params
                • QuantParams
              • Code Example #1 Empirical Bias Correction
              • Code Example #2 Analytical + Empirical Bias correction
            • AutoQuant API
              • User Guide Link
              • Examples Notebook Link
              • Top-level API
                • AutoQuant
              • Code Examples
            • BN Re-estimation APIs
              • Examples Notebook Link
              • Introduction
              • Top-level APIs
                • reestimate_bn_stats()
                • fold_all_batch_norms_to_scale()
              • Code Example - BN-Reestimation
            • Multi-GPU guidelines
            • PEFT LoRA APIs
              • User flow
              • Top-level API
                • AdapterMetaData
                • peft.replace_lora_layers_with_quantizable_layers()
                • peft.track_lora_meta_data()
                • PeftQuantUtils
                  • PeftQuantUtils.disable_lora_adapters()
                  • PeftQuantUtils.enable_adapter_and_load_weights()
                  • PeftQuantUtils.export_adapter_weights()
                  • PeftQuantUtils.freeze_base_model()
                  • PeftQuantUtils.freeze_base_model_activation_quantizers()
                  • PeftQuantUtils.freeze_base_model_param_quantizers()
                  • PeftQuantUtils.get_fp_lora_layer()
                  • PeftQuantUtils.get_quantized_lora_layer()
                  • PeftQuantUtils.quantize_lora_scale_with_fixed_range()
                  • PeftQuantUtils.set_bitwidth_for_lora_adapters()
        • aimet_torch.v2
          • What’s New
          • Backwards Compatibility
          • API Reference
            • Quantized Modules
              • Top-level API
                • BaseQuantizationMixin
                  • BaseQuantizationMixin.input_quantizers
                  • BaseQuantizationMixin.output_quantizers
                  • BaseQuantizationMixin.param_quantizers
                  • BaseQuantizationMixin.__quant_init__()
                  • BaseQuantizationMixin.compute_encodings()
                  • BaseQuantizationMixin.forward()
              • Configuration
              • Computing Encodings
              • Quantized Module Classes
            • Quantizers
              • Top-level API
                • QuantizerBase
                  • QuantizerBase.allow_overwrite()
                  • QuantizerBase.compute_encodings()
                  • QuantizerBase.get_encoding()
                  • QuantizerBase.get_legacy_encodings()
                  • QuantizerBase.is_initialized()
                  • QuantizerBase.register_quantization_parameter()
                  • QuantizerBase.set_legacy_encodings()
                • QuantizeDequantize
                  • QuantizeDequantize.forward()
                • Quantize
                  • Quantize.forward()
            • Encoding Analyzers
              • EncodingAnalyzer
              • Variants
                • MinMaxEncodingAnalyzer
                • SqnrEncodingAnalyzer
                • PercentileEncodingAnalyzer
            • FakeQuantizationMixin
              • FakeQuantizationMixin
                • FakeQuantizationMixin.input_quantizers
                • FakeQuantizationMixin.output_quantizers
                • FakeQuantizationMixin.param_quantizers
                • FakeQuantizationMixin.forward()
                • FakeQuantizationMixin.__quant_init__()
                • FakeQuantizationMixin.compute_encodings()
                • FakeQuantizationMixin.from_module()
                • FakeQuantizationMixin.implements()
            • QuantizationMixin
              • QuantizationMixin
                • QuantizationMixin.input_quantizers
                • QuantizationMixin.output_quantizers
                • QuantizationMixin.param_quantizers
                • QuantizationMixin.forward()
                • QuantizationMixin.__quant_init__()
                • QuantizationMixin.set_kernel()
                • QuantizationMixin.set_default_kernel()
                • QuantizationMixin.compute_encodings()
                • QuantizationMixin.from_module()
                • QuantizationMixin.get_default_kernel()
                • QuantizationMixin.get_kernel()
                • QuantizationMixin.implements()
            • quantization.affine
              • Classes
                • Quantize
                • QuantizeDequantize
              • Functions
                • quantize()
                • quantize_dequantize()
                • dequantize()
            • quantization.float
              • Classes
                • FloatQuantizeDequantize
                • QuantizeDequantize
            • Visualization Tools
              • visualize_stats()
      • PyTorch Model Compression API
        • Introduction
        • Top-level API for Compression
          • ModelCompressor
            • ModelCompressor.compress_model()
        • Greedy Selection Parameters
          • GreedySelectionParameters
        • Spatial SVD Configuration
          • SpatialSvdParameters
            • SpatialSvdParameters.AutoModeParams
            • SpatialSvdParameters.ManualModeParams
            • SpatialSvdParameters.Mode
              • SpatialSvdParameters.Mode.auto
              • SpatialSvdParameters.Mode.manual
        • Weight SVD Configuration
          • WeightSvdParameters
            • WeightSvdParameters.AutoModeParams
            • WeightSvdParameters.ManualModeParams
            • WeightSvdParameters.Mode
              • WeightSvdParameters.Mode.auto
              • WeightSvdParameters.Mode.manual
        • Channel Pruning Configuration
          • ChannelPruningParameters
            • ChannelPruningParameters.AutoModeParams
            • ChannelPruningParameters.ManualModeParams
            • ChannelPruningParameters.Mode
              • ChannelPruningParameters.Mode.auto
              • ChannelPruningParameters.Mode.manual
        • Configuration Definitions
          • ModuleCompRatioPair
        • Code Examples
      • PyTorch Model Visualization API for Compression
        • Top-level API Compression
          • VisualizeCompression
            • VisualizeCompression.display_eval_scores()
            • VisualizeCompression.display_comp_ratio_plot()
        • Code Examples
      • PyTorch Model Visualization API for Quantization
        • Top-level API Quantization
          • visualize_relative_weight_ranges_to_identify_problematic_layers()
          • visualize_weight_ranges()
          • visualize_changes_after_optimization()
        • Code Examples
      • PyTorch Debug API
        • Top-level API
          • LayerOutputUtil
            • LayerOutputUtil.generate_layer_outputs()
        • Enum Definition
          • NamingScheme
            • NamingScheme.ONNX
            • NamingScheme.PYTORCH
            • NamingScheme.TORCHSCRIPT
        • Code Example
    • AIMET APIs for TensorFlow
      • TensorFlow Model Quantization API
        • Model Guidelines
        • Model Preparer API
          • Top-level API
          • Code Examples
          • Limitations
        • Quant Analyzer API
          • Top-level API
          • Code Examples
        • Quantization Simulation API
          • User Guide Link
          • Top-level API
          • Code Examples
        • Adaptive Rounding API
          • User Guide Link
          • Examples Notebook Link
          • Top-level API
          • Adaround Parameters
          • Enum Definition
            • QuantScheme
              • QuantScheme.post_training_percentile
              • QuantScheme.post_training_tf
              • QuantScheme.post_training_tf_enhanced
              • QuantScheme.training_range_learning_with_tf_enhanced_init
              • QuantScheme.training_range_learning_with_tf_init
          • Code Examples
        • Cross-Layer Equalization API
          • User Guide Link
          • Examples Notebook Link
          • Introduction
          • Cross Layer Equalization API
          • Code Example
          • Primitive APIs
            • Primitive APIs for Cross Layer Equalization
              • Introduction
              • Higher Level APIs for Cross Layer Equalization
              • Code Examples for Higher Level APIs
              • Lower Level APIs for Cross Layer Equalization
              • Custom Datatype used
              • Code Example for Lower level APIs
              • Example helper methods to perform CLE in manual mode
        • BN Re-estimation APIs
          • Examples Notebook Link
          • Introduction
          • Top-level APIs
          • Code Example
          • Limitations
      • TensorFlow Debug API
        • Top-level API
        • Code Example
      • TensorFlow Model Compression API
        • Introduction
        • Top-level API for Compression
        • Greedy Selection Parameters
        • Spatial SVD Configuration
        • Configuration Definitions
          • CostMetric
            • CostMetric.mac
            • CostMetric.memory
          • CompressionScheme
            • CompressionScheme.channel_pruning
            • CompressionScheme.spatial_svd
            • CompressionScheme.weight_svd
        • Code Examples
    • AIMET APIs for ONNX
      • ONNX Model Quantization API
        • Quantization Simulation API
          • Top-level API
            • QuantizationSimModel
              • QuantizationSimModel.compute_encodings()
              • QuantizationSimModel.export()
          • Code Examples
        • Cross-Layer Equalization API
          • User Guide Link
          • Introduction
          • Cross Layer Equalization API
            • equalize_model()
          • Code Example
        • Adaptive Rounding API
          • User Guide Link
          • Top-level API
            • apply_adaround()
          • Adaround Parameters
            • AdaroundParameters
          • Code Example - Adaptive Rounding (AdaRound)
        • AutoQuant API
          • User Guide Link
          • Top-level API
            • AutoQuant
              • AutoQuant.run_inference()
              • AutoQuant.optimize()
              • AutoQuant.set_adaround_params()
              • AutoQuant.get_quant_scheme_candidates()
              • AutoQuant.set_quant_scheme_candidates()
          • Code Examples
        • QuantAnalyzer API
          • Top-level API
            • QuantAnalyzer
              • QuantAnalyzer.enable_per_layer_mse_loss()
              • QuantAnalyzer.analyze()
          • Run specific utility
            • QuantAnalyzer.create_quantsim_and_encodings()
            • QuantAnalyzer.check_model_sensitivity_to_quantization()
            • QuantAnalyzer.perform_per_layer_analysis_by_enabling_quantizers()
            • QuantAnalyzer.perform_per_layer_analysis_by_disabling_quantizers()
            • QuantAnalyzer.export_per_layer_encoding_min_max_range()
            • QuantAnalyzer.export_per_layer_stats_histogram()
            • QuantAnalyzer.export_per_layer_mse_loss()
          • Code Examples
      • ONNX Debug API
        • Top-level API
          • LayerOutputUtil
            • LayerOutputUtil.generate_layer_outputs()
        • Code Example
    • Indices and tables
  • Examples Documentation
    • Browse the notebooks
    • Running the notebooks
      • Install Jupyter
      • Download the Example notebooks and related code
      • Run the notebooks
  • Installation
    • Quick Install
    • Release Packages
    • System Requirements
    • Advanced Installation Instructions
      • Install in Host Machine
        • Install prerequisite packages
        • Install GPU packages
          • Install GPU packages for PyTorch 2.1 or PyTorch 1.13 or ONNX or TensorFlow
        • Install AIMET packages
          • From PyPI
          • From Release Package
        • Install common debian packages
        • Install tensorflow GPU debian packages
        • Install torch GPU debian packages
        • Install ONNX GPU debian packages
        • Replace Pillow with Pillow-SIMD
        • Replace onnxruntime with onnxruntime-gpu
        • Post installation steps
        • Environment setup
      • Install in Docker Container
        • Set variant
        • Use prebuilt docker image
        • Build docker image locally
        • Start docker container
        • Install AIMET packages
          • From PyPI
          • From Release Package
        • Environment setup
AI Model Efficiency Toolkit
  • Welcome to AI Model Efficiency Toolkit API Docs!
  • AIMET ONNX APIs
  • View page source
Previous Next

AIMET ONNX APIs

  • ONNX Model Quantization API
  • ONNX Debug API
Previous Next

© Copyright 2020, Qualcomm Innovation Center, Inc..

Built with Sphinx using a theme provided by Read the Docs.