AI Model Efficiency Toolkit
tf-torch-cpu_1.30.0
Quantization User Guide
Use Cases
AIMET Quantization Features
Quantization Simulation
Overview
QuantSim Workflow
Simulating Quantization Noise
Determining Quantization Parameters (Encodings)
Quantization Schemes
Configuring Quantization Simulation Ops
Frequently Asked Questions
Quantization-Aware Training (QAT)
Overview
QAT workflow
QAT modes
Recommendations for Quantization-Aware Training
Post-Training Quantization
AutoQuant
Overview
Workflow
Adaptive Rounding (AdaRound)
AdaRound Use Cases
Common terminology
Use Cases
Cross-Layer Equalization
Overview
User Flow
FAQs
References
BN Re-estimation
Overview
Workflow
Bias Correction [Depricated]
Overview
User Flow
FAQs
References
Debugging/Analysis Tools
QuantAnalyzer
Overview
Requirements
Detailed Analysis Descriptions
Visualizations
Overview
Quantization
PyTorch
TensorFlow
AIMET Quantization Workflow
PyTorch
PyTorch Model Guidelines
AIMET PyTorch Quantization APIs
Model Guidelines
Architecture Checker API
Model Preparer API
Top-level API
Code Examples
Limitations of torch.fx symbolic trace API
Model Validator API
Quant Analyzer API
Top-level API
Code Examples
Quantization Simulation API
User Guide Link
Examples Notebook Link
Guidelines
Top-level API
Enum Definition
Code Example - Quantization Aware Training (QAT)
Adaptive Rounding API
User Guide Link
Examples Notebook Link
Top-level API
Adaround Parameters
Enum Definition
Code Example - Adaptive Rounding (AdaRound)
Cross-Layer Equalization API
User Guide Link
Examples Notebook Link
Introduction
Cross Layer Equalization API
Code Example
Primitive APIs
Primitive APIs for Cross Layer Equalization
Introduction
ClsSetInfo Definition
Higher Level APIs for Cross Layer Equalization
Code Examples for Higher Level APIs
Lower Level APIs for Cross Layer Equalization
Code Examples for Lower Level APIs
Bias Correction API
User Guide Link
Bias Correction API
ConvBnInfoType
ActivationType
Quantization Params
Code Example #1 Empirical Bias Correction
Code Example #2 Analytical + Empirical Bias correction
AutoQuant API
User Guide Link
Examples Notebook Link
Top-level API
Code Examples
BN Re-estimation APIs
Examples Notebook Link
Introduction
Top-level APIs
Code Example - BN-Reestimation
Multi-GPU guidelines
Tensorflow
TensorFlow Model Guidelines
Debugging Guidelines
Quantization Guidebook
Compression User Guide
Overview
Compression Guidebook
Use Case
Compression ratio selection
Greedy Compression Ratio Selection
Overview
How it works
Per-layer Exploration
Compression Ratio Selection
Visualization
Overview
Design
Compression
Starting a Bokeh Server Session:
How to use the tool
Model Compression
Weight SVD
Spatial SVD
Channel Pruning
Overall Procedure
Channel Selection
Winnowing
Winnowing
Overview
Winnowing Overview
How Winnowing Works
Weight Reconstruction
Optional techniques to get better compression results
Rank Rounding
Per-layer Fine-tuning
FAQs
References
API Documentation
AIMET APIs for PyTorch
PyTorch Model Quantization API
Model Guidelines
Architecture Checker API
Model Preparer API
Top-level API
Code Examples
Limitations of torch.fx symbolic trace API
Model Validator API
Quant Analyzer API
Top-level API
Code Examples
Quantization Simulation API
User Guide Link
Examples Notebook Link
Guidelines
Top-level API
Enum Definition
Code Example - Quantization Aware Training (QAT)
Adaptive Rounding API
User Guide Link
Examples Notebook Link
Top-level API
Adaround Parameters
Enum Definition
Code Example - Adaptive Rounding (AdaRound)
Cross-Layer Equalization API
User Guide Link
Examples Notebook Link
Introduction
Cross Layer Equalization API
Code Example
Primitive APIs
Primitive APIs for Cross Layer Equalization
Introduction
ClsSetInfo Definition
Higher Level APIs for Cross Layer Equalization
Code Examples for Higher Level APIs
Lower Level APIs for Cross Layer Equalization
Code Examples for Lower Level APIs
Bias Correction API
User Guide Link
Bias Correction API
ConvBnInfoType
ActivationType
Quantization Params
Code Example #1 Empirical Bias Correction
Code Example #2 Analytical + Empirical Bias correction
AutoQuant API
User Guide Link
Examples Notebook Link
Top-level API
Code Examples
BN Re-estimation APIs
Examples Notebook Link
Introduction
Top-level APIs
Code Example - BN-Reestimation
Multi-GPU guidelines
PyTorch Model Compression API
Introduction
Top-level API for Compression
Greedy Selection Parameters
TAR Selection Parameters
Spatial SVD Configuration
Weight SVD Configuration
Channel Pruning Configuration
Configuration Definitions
Code Examples
PyTorch Model Visualization API for Compression
Top-level API Compression
Code Examples
PyTorch Model Visualization API for Quantization
Top-level API Quantization
Code Examples
PyTorch Debug API
Top-level API
Enum Definition
Code Example
AIMET APIs for TensorFlow
TensorFlow Model Guidelines
TensorFlow Model Quantization API
TensorFlow Model Compression API
Introduction
Top-level API for Compression
Greedy Selection Parameters
Spatial SVD Configuration
Channel Pruning Configuration
Configuration Definitions
Code Examples
Weight SVD Top-level API
Code Examples for Weight SVD
TensorFlow Model Visualization API for Quantization
Top-level API for Visualization of Weight tensors
Code Examples for Visualization of Weight tensors
Using AIMET Tensorflow APIs with Keras Models
Introduction
APIs
Code Example
Utility Functions
Tensorflow Debug API
Top-level API
Code Example
AIMET APIs for Keras
Keras Model Quantization API
Model Guidelines
Model Preparer API
Top-level API
Code Examples
Limitations
Quant Analyzer API
Top-level API
Code Examples
Quantization Simulation API
User Guide Link
Top-level API
Code Examples
Adaptive Rounding API
User Guide Link
Examples Notebook Link
Top-level API
Adaround Parameters
Enum Definition
Code Examples
Cross-Layer Equalization API
User Guide Link
Examples Notebook Link
Introduction
Cross Layer Equalization API
Code Example
Primitive APIs
Primitive APIs for Cross Layer Equalization
Introduction
Higher Level APIs for Cross Layer Equalization
Code Examples for Higher Level APIs
Lower Level APIs for Cross Layer Equalization
Custom Datatype used
Code Example for Lower level APIs
Example helper methods to perform CLE in manual mode
BN Re-estimation APIs
Examples Notebook Link
Introduction
Top-level APIs
Code Example
Limitations
Keras Debug API
Top-level API
Code Example
Keras Model Compression API
Introduction
Top-level API for Compression
Greedy Selection Parameters
Spatial SVD Configuration
Configuration Definitions
Code Examples
AIMET APIs for ONNX
ONNX Model Quantization API
Quantization Simulation API
Top-level API
Code Examples
Cross-Layer Equalization API
User Guide Link
Introduction
Cross Layer Equalization API
Code Example
Adaptive Rounding API
User Guide Link
Top-level API
Adaround Parameters
Code Example - Adaptive Rounding (AdaRound)
AutoQuant API
User Guide Link
Top-level API
Code Examples
ONNX Debug API
Top-level API
Code Example
Indices and tables
Examples Documentation
Browse the notebooks
Running the notebooks
Install Jupyter
Download the Example notebooks and related code
Run the notebooks
Installation
Release packages
System Requirements
Installation Instructions
Install in Host Machine
Install prerequisite packages
Install GPU packages
Install GPU packages for PyTorch 1.9 or ONNX
Install GPU packages for PyTorch 1.13
Install GPU packages for TensorFlow
Install AIMET packages
Install common debian packages
Install tensorflow GPU debian packages
Install torch GPU debian packages
Install ONNX GPU debian packages
Replace Pillow with Pillow-SIMD
Replace onnxruntime with onnxruntime-gpu
Post installation steps
Environment setup
Install in Docker Container
Set variant
Use prebuilt docker image
Build docker image locally
Start docker container
Install AIMET packages
Environment setup
AI Model Efficiency Toolkit
Index
Index
A
|
B
|
C
|
D
|
E
|
F
|
G
|
L
|
M
|
N
|
O
|
P
|
Q
|
R
|
S
|
T
|
U
|
V
|
W
A
ActivationType (class in aimet_common.defs)
AdaroundParameters (class in aimet_tensorflow.adaround.adaround_weight)
,
[1]
(class in aimet_torch.adaround.adaround_weight)
analytical_bias_correction_per_layer() (in module aimet_tensorflow.bias_correction.BiasCorrection)
analyze() (aimet_tensorflow.keras.quant_analyzer.QuantAnalyzer method)
(aimet_tensorflow.quant_analyzer.QuantAnalyzer method)
(aimet_torch.quant_analyzer.QuantAnalyzer method)
apply() (aimet_tensorflow.auto_quant.AutoQuant method)
(aimet_torch.auto_quant.AutoQuant method)
apply_adaround() (in module aimet_tensorflow.adaround.adaround_weight.Adaround)
(in module aimet_torch.adaround.adaround_weight.Adaround)
auto (aimet_tensorflow.defs.ChannelPruningParameters.Mode attribute)
(aimet_tensorflow.defs.SpatialSvdParameters.Mode attribute)
,
[1]
(aimet_torch.defs.ChannelPruningParameters.Mode attribute)
(aimet_torch.defs.SpatialSvdParameters.Mode attribute)
(aimet_torch.defs.WeightSvdParameters.Mode attribute)
AutoQuant (class in aimet_tensorflow.auto_quant)
(class in aimet_torch.auto_quant)
(class in aimet_torch.auto_quant_v2)
B
bias_correction_per_layer() (in module aimet_tensorflow.bias_correction.BiasCorrection)
bias_fold() (in module aimet_tensorflow.cross_layer_equalization.HighBiasFold)
,
[1]
(in module aimet_tensorflow.keras.cross_layer_equalization.HighBiasFold)
,
[1]
(in module aimet_torch.cross_layer_equalization.HighBiasFold)
,
[1]
BiasCorrectionParams() (in module aimet_tensorflow.bias_correction)
C
channel_pruning (aimet_common.defs.CompressionScheme attribute)
,
[1]
ChannelPruningParameters (class in aimet_tensorflow.defs)
(class in aimet_torch.defs)
ChannelPruningParameters.AutoModeParams (class in aimet_tensorflow.defs)
(class in aimet_torch.defs)
ChannelPruningParameters.ManualModeParams (class in aimet_tensorflow.defs)
(class in aimet_torch.defs)
ChannelPruningParameters.Mode (class in aimet_tensorflow.defs)
(class in aimet_torch.defs)
check_model_arch() (in module aimet_torch.arch_checker.arch_checker.ArchChecker)
ClsSetInfo (class in aimet_tensorflow.cross_layer_equalization)
(class in aimet_tensorflow.keras.cross_layer_equalization)
(class in aimet_torch.cross_layer_equalization)
ClsSetInfo.ClsSetLayerPairInfo (class in aimet_tensorflow.cross_layer_equalization)
(class in aimet_tensorflow.keras.cross_layer_equalization)
(class in aimet_torch.cross_layer_equalization)
compress_model() (aimet_tensorflow.compress.ModelCompressor static method)
(aimet_tensorflow.keras.compress.ModelCompressor static method)
(aimet_torch.compress.ModelCompressor static method)
compress_net() (aimet_tensorflow.svd.Svd method)
CompressionScheme (class in aimet_common.defs)
,
[1]
compute_encodings() (aimet_tensorflow.keras.quantsim.QuantizationSimModel method)
(aimet_tensorflow.quantsim.QuantizationSimModel method)
(aimet_torch.quantsim.QuantizationSimModel method)
ConvBnInfoType (class in aimet_common.bias_correction)
correct_bias() (in module aimet_tensorflow.bias_correction.BiasCorrection)
(in module aimet_torch.bias_correction)
CostMetric (class in aimet_common.defs)
,
[1]
D
display_comp_ratio_plot() (aimet_torch.visualize_serialized_data.VisualizeCompression method)
display_eval_scores() (aimet_torch.visualize_serialized_data.VisualizeCompression method)
E
enable_per_layer_mse_loss() (aimet_torch.quant_analyzer.QuantAnalyzer method)
equalize_model() (in module aimet_tensorflow.cross_layer_equalization)
(in module aimet_tensorflow.keras.cross_layer_equalization)
(in module aimet_torch.cross_layer_equalization)
export() (aimet_tensorflow.keras.quantsim.QuantizationSimModel method)
(aimet_tensorflow.quantsim.QuantizationSimModel method)
(aimet_torch.quantsim.QuantizationSimModel method)
F
fold_all_batch_norms() (in module aimet_tensorflow.batch_norm_fold)
(in module aimet_tensorflow.keras.batch_norm_fold)
(in module aimet_torch.batch_norm_fold)
fold_all_batch_norms_to_scale() (in module aimet_tensorflow.batch_norm_fold)
(in module aimet_tensorflow.keras.batch_norm_fold)
(in module aimet_torch.batch_norm_fold)
fold_given_batch_norms() (in module aimet_tensorflow.batch_norm_fold)
(in module aimet_tensorflow.keras.batch_norm_fold)
(in module aimet_torch.batch_norm_fold)
G
generate_layer_outputs() (aimet_tensorflow.keras.layer_output_utils.LayerOutputUtil method)
(aimet_tensorflow.layer_output_utils.LayerOutputUtil method)
(aimet_torch.layer_output_utils.LayerOutputUtil method)
get_quant_scheme_candidates() (aimet_torch.auto_quant_v2.AutoQuant method)
GreedySelectionParameters (class in aimet_common.defs)
L
LayerOutputUtil (class in aimet_tensorflow.keras.layer_output_utils)
(class in aimet_tensorflow.layer_output_utils)
(class in aimet_torch.layer_output_utils)
load_checkpoint() (aimet_torch.quantsim method)
load_keras_model_multi_gpu() (in module aimet_tensorflow.utils.convert_tf_sess_to_keras)
load_tf_sess_variables_to_keras_single_gpu() (in module aimet_tensorflow.utils.convert_tf_sess_to_keras)
M
mac (aimet_common.defs.CostMetric attribute)
,
[1]
manual (aimet_tensorflow.defs.ChannelPruningParameters.Mode attribute)
(aimet_tensorflow.defs.SpatialSvdParameters.Mode attribute)
,
[1]
(aimet_torch.defs.ChannelPruningParameters.Mode attribute)
(aimet_torch.defs.SpatialSvdParameters.Mode attribute)
(aimet_torch.defs.WeightSvdParameters.Mode attribute)
map_cls_sets_to_new_session() (aimet_tensorflow.cross_layer_equalization.ClsSetInfo static method)
memory (aimet_common.defs.CostMetric attribute)
,
[1]
ModelCompressor (class in aimet_tensorflow.compress)
(class in aimet_tensorflow.keras.compress)
(class in aimet_torch.compress)
ModuleCompRatioPair (class in aimet_tensorflow.defs)
,
[1]
(class in aimet_torch.defs)
N
NamingScheme (class in aimet_torch.layer_output_utils)
no_activation (aimet_common.defs.ActivationType attribute)
O
ONNX (aimet_torch.layer_output_utils.NamingScheme attribute)
optimize() (aimet_torch.auto_quant_v2.AutoQuant method)
P
post_training_percentile (aimet_common.defs.QuantScheme attribute)
,
[1]
,
[2]
,
[3]
post_training_tf (aimet_common.defs.QuantScheme attribute)
,
[1]
,
[2]
,
[3]
post_training_tf_enhanced (aimet_common.defs.QuantScheme attribute)
,
[1]
,
[2]
,
[3]
prepare_model() (in module aimet_tensorflow.keras.model_preparer)
(in module aimet_torch.model_preparer)
PYTORCH (aimet_torch.layer_output_utils.NamingScheme attribute)
Q
QuantAnalyzer (class in aimet_tensorflow.keras.quant_analyzer)
(class in aimet_tensorflow.quant_analyzer)
(class in aimet_torch.quant_analyzer)
QuantizationSimModel (class in aimet_tensorflow.keras.quantsim)
(class in aimet_tensorflow.quantsim)
(class in aimet_torch.quantsim)
QuantParams (class in aimet_tensorflow.bias_correction)
(class in aimet_torch.quantsim)
QuantScheme (class in aimet_common.defs)
,
[1]
,
[2]
,
[3]
R
reestimate_bn_stats() (in module aimet_tensorflow.bn_reestimation)
(in module aimet_tensorflow.keras.bn_reestimation)
(in module aimet_torch.bn_reestimation)
relu (aimet_common.defs.ActivationType attribute)
relu6 (aimet_common.defs.ActivationType attribute)
run_inference() (aimet_torch.auto_quant_v2.AutoQuant method)
S
save_as_tf_module_multi_gpu() (in module aimet_tensorflow.utils.convert_tf_sess_to_keras)
save_checkpoint() (aimet_torch.quantsim method)
save_tf_session_single_gpu() (in module aimet_tensorflow.utils.convert_tf_sess_to_keras)
scale_cls_sets() (in module aimet_tensorflow.cross_layer_equalization.CrossLayerScaling)
(in module aimet_tensorflow.keras.cross_layer_equalization.CrossLayerScaling)
(in module aimet_torch.cross_layer_equalization.CrossLayerScaling)
scale_model() (in module aimet_tensorflow.cross_layer_equalization.CrossLayerScaling)
(in module aimet_tensorflow.keras.cross_layer_equalization.CrossLayerScaling)
(in module aimet_torch.cross_layer_equalization.CrossLayerScaling)
set_adaround_params() (aimet_tensorflow.auto_quant.AutoQuant method)
(aimet_torch.auto_quant.AutoQuant method)
(aimet_torch.auto_quant_v2.AutoQuant method)
set_export_params() (aimet_torch.auto_quant.AutoQuant method)
(aimet_torch.auto_quant_v2.AutoQuant method)
set_model_preparer_params() (aimet_torch.auto_quant_v2.AutoQuant method)
set_quant_scheme_candidates() (aimet_torch.auto_quant_v2.AutoQuant method)
spatial_svd (aimet_common.defs.CompressionScheme attribute)
,
[1]
SpatialSvdParameters (class in aimet_tensorflow.defs)
,
[1]
(class in aimet_torch.defs)
SpatialSvdParameters.AutoModeParams (class in aimet_tensorflow.defs)
,
[1]
(class in aimet_torch.defs)
SpatialSvdParameters.ManualModeParams (class in aimet_tensorflow.defs)
,
[1]
(class in aimet_torch.defs)
SpatialSvdParameters.Mode (class in aimet_tensorflow.defs)
,
[1]
(class in aimet_torch.defs)
Svd (class in aimet_tensorflow.svd)
T
TarRankSelectionParameters (class in aimet_torch.defs)
TORCHSCRIPT (aimet_torch.layer_output_utils.NamingScheme attribute)
training_range_learning_with_tf_enhanced_init (aimet_common.defs.QuantScheme attribute)
,
[1]
,
[2]
,
[3]
training_range_learning_with_tf_init (aimet_common.defs.QuantScheme attribute)
,
[1]
,
[2]
,
[3]
U
update_keras_bn_ops_trainable_flag() (in module aimet_tensorflow.utils.graph)
V
visualize_changes_after_optimization() (in module aimet_torch.visualize_model)
visualize_relative_weight_ranges_single_layer() (in module aimet_tensorflow.plotting_utils)
visualize_relative_weight_ranges_to_identify_problematic_layers() (in module aimet_torch.visualize_model)
visualize_weight_ranges() (in module aimet_torch.visualize_model)
visualize_weight_ranges_single_layer() (in module aimet_tensorflow.plotting_utils)
VisualizeCompression (class in aimet_torch.visualize_serialized_data)
W
weight_svd (aimet_common.defs.CompressionScheme attribute)
,
[1]
WeightSvdParameters (class in aimet_torch.defs)
WeightSvdParameters.AutoModeParams (class in aimet_torch.defs)
WeightSvdParameters.ManualModeParams (class in aimet_torch.defs)
WeightSvdParameters.Mode (class in aimet_torch.defs)