Release notes¶

2.19.0¶

New Features
Bug fixes and Improvements
- ONNX
  
  Make LiteMP API percentage float (69f96ff)
  
  Set layernorm int16 weight to symmetric by default (8560e13)
  
  Automatically insert data movement op output qdq during to_onnx_qdq (15c8b9b)
  
  Create LazyExtractor to handle external data for onnx Extractor utils (104e7e8)
  
  Tie input/output encodings across maximum Concat subgraph (832ea91)
  
  Tie hidden state quantizers of RNN/GRU/LSTM (c18fd05)
- Torch
  
  Fix histogram observer rebinning logic (2c88364)
  
  Fix connectedgraph input ordering for non-trivial layer types (2b7b548)
- Common
  
  Disable per-channel quantization of RNN/GRU/LSTM for all HTP backends (df8b875)

2.18.0¶

New Features
- Torch
  
  Promoted aimet_torch.onnx.export and QuantizationSimModel.onnx.export as production APIs (99160d2, e026fd1)
  
  Added utility functions to exclude some or all unknown nn.Modules from quantization (5a419f3, 501eebd)
Bug fixes and Improvements
- ONNX
  
  Fixed supergroup misidentification bug upon MatMul-MatMul-Add sequence (ab63866)
- Torch
  
  Made compatible with PyTorch 1.13 (47fae94)
  
  Made compatible with PyTorch 2.9 (283ecc1)
- Common
  
  Set priority among supergroups (6676a6c)

2.17.0¶

Bug fixes and Improvements
- ONNX
  
  Optimize SeqMSE latency and CPU memory usage (434ac6b)
  
  Support excluding nodes from SeqMSE optimization (6a37239)
  
  Support exporting large models (> 2GB) to ONNX QDQ (b1dafe6, 1bf8b82)
  
  Support exporting float16 ONNX models to ONNX QDQ (66ccb45)
  
  Allow disabling MatMul-Add supergroup via config file (e49660c)
  
  Fix bug where on-disk tensor data is deleted before InferenceSession (d57a934)
- Torch
  
  Fix sim.export bug when using Python >= 3.12 (ee949a2)
  
  Allow export for back-to-back quantizers which share the same encodings (28a7382)
  
  Fix numerical issue in FPTQuant (f0bc6c9)
- Common
  
  Remove Conv-Relu supergroup from HTP < V73 config files (19e5a4e)
  
  Fix LayerNorm and InstanceNorm weight symmetry in HTP < V73 config files (eb1ac5c, ce1ea63)

2.16.0¶

New Features
- ONNX
  
  Experimental - Added Adascale, a post-training quantization technique (5e23ceb)
Bug fixes and Improvements
- ONNX
  
  Skip tying Concat input/output quantizers with conflicting encoding constraints (b924107)
  
  Small updates to FPT Quant for improved accuracy (ba10947)
  
  Implement partial encoding freezing mechanism in aimet-onnx (658ec3c)
  
  Add Relu partial encoding constraints to HTP config files (dc8d978)
  
  Clear encoding analyzer stats after computing param encodings (3d4725f)
  
  Remove wasted computation/memory in FPTQuant local optimizer (59350af)
- Torch
  
  Allow boolean type casting of QuantizedTensors (7d63e66)
  
  Implement partial encoding freezing mechanism in aimet-torch (1b99a39)
  
  Improve scale post-processing to prevent scale freezing during QAT (6fe56b0)

2.15.0¶

Bug fixes and Improvements
- ONNX
  
  Throws an error on bfloat16 models (5181860)
  
  Added docs and examples for LiteMP (3d5e0dd)
  
  Export to QDQ ONNX with pre-quantized constants (a97354f)
- PyTorch
  
  Fix multiple dispatch issue when torch function is called in nested context manager (6216ca0)
- Keras
  
  2.14.0 is the last release of aimet-tf (087e9b1)
- Common
  
  Added PSNR metrics (14c8e81)

2.14.0¶

New Feature
- ONNX
  
  Add support for FP16 in QuantizationSimModel (2494d90)
Bug fixes and Improvements
- ONNX
  
  Add sequential MSE support for onnx >= 1.18.0. (754d030)
  
  Improve histogram granularity during TFE calibration (91109af)
  
  Improve runtime for QuantizationSimModel creation for large models like LLMs (f7e700f)
  
  Improve runtime for setting quantizers in a QuantizationSimModel for use cases like tying KV Cache input and output quantizers. (c0bdb46)
  
  Add a check for None values in the group attribute of Conv layers and fix improper handling of None group attribute in ConvTranspose within fold_all_batch_norms_to_weight() (374e8db)
- PyTorch
  
  Address QAT convergence issue: Add a fix for cases where quantizer.min becomes equal to quantizer.max during training, leading to NaN values (51f8990)
- Keras
  
  Fix accuracy drop issue for GPU wheel by excluding libpython*.so* from the aimet wheel packages (22cac5c)
- Common
  
  Remove Conv3d, Conv3dTranspose, and DepthwiseConv ops followed by activation from the supergroup until HTP support is available. (05f6810)
  
  Fix color theme issue in documentation causing code snippets to render incorrectly (2c64eac)

2.13.0¶

Bug fixes and Improvements
- ONNX
  
  Adjust weight scale for int32 bias overflow in W16A16 quantization (f39c0bf)
  
  AutoQuant: Remove deprecated feature (414cdde)
  
  Support exporting large models in aimet-onnx (0fe6701)
  
  AdaRound: Delete deprecated top-level API. (bfba557)
  
  AdaRound: Skip optimization if no input to layer (18dfedc)
- PyTorch
  
  Enable save_model_as_external_data for sim.onnx.export (107b339)
Known Issues
- Keras
  
  Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.

2.12.0¶

Bug fixes and Improvements
- Common
  
  Remove data movement ops from config (ae02aa8)
- ONNX
  
  Exclude bias from quantization when weights are not quantized (62f5879)
  
  AdaRound: Fix prelu failing in CUDA model (b2350b2)
- PyTorch
  
  Wrap aimet_torch.onnx.export with torch.no_grad (b73bb71)
Known Issues
- Keras
  
  Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.

2.11.0¶

New Feature
- PyTorch
  
  SpinQuant (experimental) - implement SpinQuant PTQ technique (https://arxiv.org/pdf/2405.16406) for Llama, Qwen2, and Mistral families (R1 rotation w/o optimization) (7364b37)
  
  Enable Adascale and Omniquant for Mistral (d33e98c)
- ONNX
  
  Enable llm_configurator for Llama (Experimental) (08c17b8)
Bug fixes and Improvements
- Common
  
  Represent LPBQ as DequantizeLinear in onnx QDQ (a967b8f)
  
  Add additional sanity checks in LPBQ export logic (45c2a65)
  
  Allow negative block axis in LPBQ QDQ export (6f670a4)
  
  Add support for enabling param bw=2 in QuantSim (2d4e0eb)
  
  Fix tanh output encoding range to [-1, 1] (3c92bb7)
- ONNX
  
  Apply matmul exception rule only for integer quantization (bb93c76)
  
  Optimize blockwise min-max encoding analyzer (4febdd4)
  
  Remove explicit FP32 model creation inside AdaRound and optimize building sessions during the optimization process (b1415bd)
  
  Make Concat output quantizer inherit fixed input range (50f35dd)
  
  Enable output quantizers to inherit input encoding when tying encodings (3750526)
  
  Fix bug in CLE with bn_conv groups (654f4b1)
- PyTorch
  
  Guarantee positive scale during aimet-torch QAT (2ed8305)
  
  Add secondary progress bars to Adascale and Omniquant (6c92a97)
Documentation Updates
- Update Quick Start example and PTQ section (6c9f584)
- Add missing workflow images (f961ed4)
Known Issues
- Keras
  
  Accuracy drop observed with AIMET Keras for certain models. Fix is planned for the next release.

2.10.0¶

New Feature
- Promote to_onnx_qdq to a public API (f333188). Note: This is currently a beta feature
Bug fixes and Improvements
- Common
  
  Added hover tooltip to plot per layer sensitivity. Changed x-axis to plot layer indices instead of names (c96894f)
- PyTorch
  
  Implement scaling factor in aimet-torch float QDQ (9b8c655)
  
  Fix CustomSiLU bug (499df9f)
  
  Added extra logic to isolate model outputs from connectedgraph (4ad0703)
  
  Always instantiate quantizers with requires_grad=True (5aac9c5)
- ONNX
  
  Allow AdaRound and SeqMSE to take uncalibrated sims(31ca7fd)
  
  Modify bias quantizer setting based on weight quantizer (b47a97e)
  
  Fix cnt overflow issue (70029c5)
  
  Make memory saving optimization default in build_session and _infer_activation_dtypes (4b94ca9)
Documentation
- Update SeqMSE feature guide (fefd504)
- Fix links in example notebooks (fe66376)
- Modify docs for CLE (f9d0d6c)
- Edit automatic mixed precision feature guide (22b5c94)
- Polish BQ user guide (f547a49)
- Polish QAT user guide (339a225)

2.9.0¶

Bug Fixes and Improvements
- ONNX
  
  Rename QuantizeLinear outputs from <…>_int to <…>_q in onnx QDQ export (e78dbec)
  
  Preserve I/O names in onnx QDQ export (35ad990)
  
  Allow freezing loaded encodings in load_encodings_to_sim (911af75)
  
  Represent activation QDQ with uint in encodings 2.0.0 in onnx QDQ export (92f63f5)
  
  Allow aimet-onnx to load partial encodings (6636515)
  
  Fix onnx sim.export permanently removing quantizers (9a2a407)
  
  Fix onnx QDQ export output name swapping bug (6d1664c)
  
  Switch AdaRound API naming to num_iterations (fea395f)
- PyTorch
  
  Add native support for Mistral-0.3 (db99447)
  
  AdaScale: Update the learning rates for AdaScale learnable parameters (7336ead)
  
  AdaScale: Add LR scheduler and add block input sampling probability (2f05175)
  
  AdaScale: Maintain LR per model and fix first sample being used during loss computation(ac05d10)
- Common
  
  Add docs to build aimet from source (ae981f7)

2.8.0¶

New Features
- ONNX
  
  Update aimet_onnx QuantizationSimModel.__init__() function signature (cbe67ae)
  
  Defined new AdaRound API aimet_onnx.apply_adaround() (84edcf5)
  
  Defined new sequential MSE API aimet_onnx.apply_seq_mse() (836ab1e)
  
  Defined new per-layer sensitivity analysis API aimet_onnx.analyze_per_layer_sensitivity() (dc34fa4)
  
  Allowed onnx QuantizationSimModel.compute_encodings() to take iterables (2c8ae88)
- PyTorch
  
  Added native support for huggingface Phi-3 (80cd141)
Bug Fixes and Improvements
- ONNX
  
  Made dynamic weights of Conv, ConvTranspose, Gemm, and MatMul follow the symmetry of static weights (ce68e75)
  
  aimet-onnx on PyPI is now compatible with onnxruntime-gpu (6d3aa97)
  
  Unpinned onnx version (abe8782)
  
  Changed default execution provider to CPUExecutionProvider (e7d10c7)
  
  Made QcQuantizeOp’s data_type attribute always consistent without additional reconfiguration (8009871)
  
  Made delta/offset and min/max always consistent (88706ef)
- PyTorch
  
  Made input quantizers always get enabled whenever the input wasn’t already quantized (a2adae2)
  
  Deprecated saving PyTorch model object during QuantizationsimModel.export() (b5521f3)
Known Issues * ONNX
- Adaround runs over 2x slower with onnxruntime 1.20 or higher. The root cause has been identified, and a fix is in progress

2.7.0¶

New Features
- PyTorch
  
  OmniQuant (experimental) - implement OmniQuant PTQ technique (https://arxiv.org/pdf/2308.13137) for Llama and Qwen2 model families
Bug Fixes and Improvements
- ONNX
  
  Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
  
  Support loading encodings for missing quantizers
  
  Set bitwidth of tensor quantizer while loading encodings
- PyTorch
  
  Remove DlCompression, DlEqualization, OpenCV, zlib dependencies
  
  Export encodings for data movement operations in ONNX QDQ export
  
  AdaScale (experimental) - support for updating Conv2D layers in blocks
  
  AdaScale (experimental) - update API to take num_iterations instead of num_epochs

2.6.0¶

New Features
- ONNX
  
  Support for passing onnxruntime EPs directly to QuantizationSimModel.__init__()
- PyTorch
  
  Support for simulating float8 quantization
  
  Experimental: Added aimet_torch.onnx.export() API for exporting QuantizationSimModel to onnx QDQ graph
  
  Added native support for huggingface Llama, Qwen2, and Gemma3 (1493fe1)
Bug Fixes and Improvements
- ONNX
  
  Reduced CPU and GPU memory usage during sequential MSE
  
  Fixed AMP generating incompatible quantizer configurations
  
  Fixed AMP errors with dynamic Conv ops
  
  Aligned computation of symmetric encodings with aimet_torch
- PyTorch
  
  Fixed AttributeError when catching torch.onnx.export() failures during QuantSim export
  
  Fixed errors being thrown when deepspeed import fails
  
  Aligned input and output encodings for Resize layers
  
  Added supergroup fusion handling for LeakyRelu layers
  
  Docs: Updated LoRA user guide
Deprecations:
- ONNX
  
  Deprecated use_cuda, device, rounding_mode, and use_symmetric_encodings args to QuantizationSimModel.__init__()

2.5.0¶

New Features
- ONNX
  
  Added a new set_quantizers() API to QuantizationSimModel
- PyTorch
  
  Added new api to fold param quantizers
  
  Experimental: AdaScale - a new post-training quantization technique
Bug Fixes
- ONNX
  
  Cleaned up tempfiles generated by large model export
- PyTorch
  
  Fixed nullptr error in FloatEncoding
  
  Checked wrong parameter access only upon AttributeError
  
  Changed to import spconv lazily
  
  Fixed type error in transformer utils

2.4.0¶

New Features
- ONNX
  
  Introduced option to export only encodings
- Common
  
  Added RMSNormalization in default AIMET config
Bug Fixes
- ONNX
  
  Removed cublas dependency from the libpymo executable
  
  Represent y_zero_point as int
  
  Represent per-block scale as int
- PyTorch
  
  SeqMSE optimizes nested modules once improving turn-around time
  
  CrossLayerEqualization does not replaces ReLU6 with ReLU automatically
  
  AMP creates distict quantizer groups for model inputs

2.3.0¶

New Features
- ONNX
  
  Upgraded CUDA to 12.1.0
  
  Upgraded ONNX-Runtime to 1.19.2
  
  Reduced QuantizationSimModel.export() time
Bug Fixes
- ONNX
  
  Fixed bug in QuantizationSimModel.export() to export ONNX models with external weights to one file

2.2.0¶

New Features
- PyTorch and ONNX
  
  Added “min_max” (QuantScheme.min_max) as a new name for “post_training_tf” quant scheme
- ONNX
  
  Introduced supergroup pattern-matching for complicated patterns such as LayerNormalization and RMSNorm
Bug Fixes
- PyTorch
  
  Restored aimet_torch.v1 tf-enhanced behavior
  
  Updated Sequential MSE candidate logic to compute encoding candidates. Vectorized blockwise sequential MSE loss calculation for nn.Linear
- ONNX
  
  Fixed bug in QuantizationSimModel._tie_quantizers() which propagates encodings to first op of parent ops if parent op is not quantizable

2.1.0¶

New Features
- PyTorch and ONNX
  
  AIMET QuantSim by default uses per-channel quantization for weights instead of per-tensor [Breaking change]
  
  AIMET QuantSim exports encoding json schema version 1.0.0 by default
- PyTorch
  
  AIMET now quantizes scalar inputs of type torch.nn.Parameter - these were not quantized in prior releases
  
  Published recipe for performing LoRA QAT - using LoRA adapters to recover quantized accuracy of the base model. Includes recipes for weight-only (WQ) and weight-and-activation (QWA) QAT
Bug Fixes
- PyTorch
  
  Fixed a bug that prevented Adaround from caching data samples with PyTorch versions 2.6 and later

2.0.0¶

New Features
- Common
  
  Reorganized the documentation to more clearly explain AIMET procedures
  
  Redesigned the documentation using the Furo theme
  
  Added post-AIMET procedures on how to take AIMET quantized model to Qualcomm® AI Engine Direct and Qualcomm® AI Hub
- PyTorch
  
  BREAKING CHANGE: aimet_torch.v2 has become the default API. All the legacy APIs are migrated to aimet_torch.v1 subpackage, for example from aimet_torch.qc_quantize_op to aimet_torch.v1.qc_quantize_op
  
  Added Manual Mixed Precision Configurator (Beta) to make it easy to configure a model in Mixed Precision.
- ONNX
  
  Optimized QuantizationSimModel.__init__() latency
  
  Align ConnectedGraph representation with onnx graph
Bug Fixes
- ONNX
  
  Bug fixes for Adaround
  
  Bug fixes for BN fold
Upgrading
- PyTorch
  
  aimet_torch 2 is fully backward compatible with all the public APIs of aimet_torch 1.x. If you are using low-level components of QuantizationSimModel, please see Migrate to aimet_torch 2.

1.35.1¶

PyTorch
- Fixed package versioning for compatibility with latest pip version

1.35.0¶

PyTorch
- Added support for W16A16 in Autoquant.
Deprecation Notice
- Support for Pytorch 1.13 is deprecated. It will be removed in next release.
ONNX
- Optimized Memory and Speed utilization (for CPU).

1.34.0¶

PyTorch
- Added support for WSL2
- CUDA version upgraded for Pytorch 2.1
- Extended QuantAnalyzer functionality for LLM range analysis
Keras
- Adds support for certain TFOpLambda layers created by tf functional calls.
ONNX
- Upgraded AIMET to support ONNX version 1.16.1 and ONNXRUNTIME version 1.18.1.

1.33.5¶

PyTorch
- Various bugfixes/QoL updates for LoRA
- Updated minimum scale value and registered additional custom quantized ops with QuantSim 2.0

1.33.0¶

PyTorch
- Enhancements done in export pipeline for GPU memory optimization with LLMs.
- [Experimental] Added support for handling of LoRA (via PEFT API) in AIMET. and enabled export of required artifacts for QNN.
- Added examples for training pipeline with for distributed KD-QAT.
- [Experimental] Added support for block wise quantization (BQ) to support w4fp16 format, and the low-power block quantization (LPBQ) to support w4a8 and w4a16 formats. This feature needs QuantSim V2.

1.32.0¶

PyTorch
- Added MultiGPU support for Adaround.
- Upgraded AIMET to support PyTorch version 2.1 as a new variant. AIMET with PyTorch version 1.13 remains the default.
Keras
- For models with SeparableConv2D layers, use model_preparer first before applying any quantization API.
Common
- Upgraded AIMET to support Ubuntu22 and Python3.10 for all AIMET variants.

1.31.0¶

ONNX
- Added support for custom ops in QuantSim, CLE, AdaRound and AMP.
- Added support for Quant Analyzer.
Keras
- Added support for unrolled quantized LSTM with only Quantsim in PTQ mode.
- Fix for ReLU Encoding min going past 0 for QAT.
- Fixes Input Quantizers for TFOpLambda Layers (kwargs)
- Fixes logic for placing input quantizers

1.30.0¶

ONNX
- Upgraded AIMET to support Onnx version 1.14 and ONNXRUNTIME version 1.15.
- Added support for AutoQuant.

1.29.0¶

Keras
- Fixes issues with TF Op Lambda Layers in Qc Quantize Wrappers call.
PyTorch
- [experimental] Support for embedding AIMET encodings within the graph using ONNX quantize/dequantize operators. Currently this option is only supported when using 8bit per-tensor quantization.
ONNX
- Added support for Adaround.

1.28.0¶

Keras
- Added Support for Spatial SVD Compression feature.
- [experimental] Debugging APIs have been added for dumping intermediate tensor outputs. This data can be used with current QNN/SNPE tools for debugging accuracy problems.
PyTorch
- Upgraded AIMET Pytorch default version to 1.13. AIMET remains compatible with Pytorch version 1.9.
ONNX
- [experimental] Debugging APIs have been added for dumping intermediate tensor outputs. This data can be used with current QNN/SNPE tools for debugging accuracy problems.

1.27.0¶

Keras
- Update support for TFOpLambda layers in Batch Norm Folding with extra call args/kwargs.
PyTorch
- Added AIMET to support PyTorch version 1.13.0. Only ONNX opset 14 is supported for export.
- [experimental] Debugging APIs have been added for dumping intermediate tensor data. This data can be used with current QNN/SNPE tools for debugging accuracy problems. Layer Output Generation API gives incorrect tensor data for the layer just before Relu when used for original FP32 model.
- [experimental] Support for embedding AIMET encodings within the graph using ONNX quantize/dequantize operators. Currently this is option is only supported when using 8bit per-tensor quantization.
- Fixed a bug in AIMET QuantSim for PyTorch models to handle non-contiguous tensors.
ONNX
- AIMET support for ONNX 1.11.0 has been added. However there is currently limited op support in QNN/SNPE. If the model fails to load please continue to use opset 11 for export.
TensorFlow
- [experimental] Debugging APIs have been added for dumping intermediate tensor outputs. This data can be used with current QNN/SNPE tools for debugging accuracy problems.

1.26.0¶

Keras
- Added a feature called BN Re-estimation that can improve model accuracy after QAT for INT4 quantization.
- Updated the AutoQuant feature to automatically choose the optimal calibration scheme, create an HTML report on which optimizations were applied.
- Update to Model Preparer to replace separable conventional with depth wise and point wise conv layers.
- Fixes BN fold implementation to account for a subsequent multi-input layer
- Fixed a bug where min/max encoding values were not aligned with scale/offset during QAT.
PyTorch
- Several bug fixes
TensorFlow
- Added a feature called BN Re-estimation that can improve model accuracy after QAT for INT4 quantization
- Updated the AutoQuant feature to automatically choose the optimal calibration scheme, create an HTML report on which optimizations were applied.
- Fixed a bug where min/max encoding values were not aligned with scale/offset during QAT.
Common
- Documentation updates for taking AIMET models to target.
- Standalone Batchnorm layers parameter’s conversion such that it will behave as linear/dense layer.
- [Experimental] Added new Architecture Checker feature to identify and report model architecture constructs that are not ideal for quantized runtimes. Users can utilize this information to change their model architectures accordingly.

1.25.0¶

Keras
- Added QuantAnalyzer feature
- Adds Batch Normalization folding for Functional Keras Models. This allows the default config files to work for super grouping.
- Resolved an issue with quantizer placement in Sequential blocks in subclassed models
PyTorch
- Added AutoQuant V2 which includes advanced features such as out-of-the-box inference, model preparer, quant scheme search, improved summary report, etc.
- Fixes to resolve minor accuracy diffs in the learnedGrid quantizer for per-channel quantization
- Fixes to improve EfficientNetB4 accuracy w/respect to target
- Fixed rare case where quantizer may calculate incorrect offset when generating QAT 2.0 learned encodings
TensorFlow
- Added QuantAnalyzer feature
- Fixed an accuracy issue due to rare cases where the incorrect BN epsilon was being used
- Fixed an accuracy issue due to Quantsim export incorrectly recomputing QAT2.0 encodings
Common
- Updated AIMET python package version format to support latest pip
- Fixed an issue where not all inputs might be quantized properly

1.24.0¶

PyTorch
- Fixes to resolve minor accuracy diffs in the learnedGrid quantizer for per-channel quantization
- Added support for AMP 2.0 which enables faster automatic mixed precision
- Added support for QAT for INT4 quantized models – includes a feature for performing BN Re-estimation after QAT
Keras
- Added support for AMP 2.0 which enables faster automatic mixed precision
- Support for basic transformer networks
- Added support for subclassed models. The current subclassing feature includes support for only a single level of subclassing and does not support lambdas.
- Added QAT per-channel gradient support
- Minor updates to the quantization configuration
- Fixed QuantSim bug where layers using dtypes other than float were incorrectly quantized
TensorFlow
- Added an additional prelu mapping pattern to ensure proper folding and quantsim node placement
- Fixed per-channel encoding representation to align with Pytorch and Keras
Common
- Export quantsim configuration for configuring downstream target quantization

1.23.0¶

PyTorch
- Fixed backward pass of the fake-quantize (QcQuantizeWrapper) nodes to handle symmetric mode correctly
- Per-channel quantization is now enabled on a per-op-type basis
- Support for recursively excluding module from a root module in QuantSim
- Support for excluding layers when running model validator and model preparer
- Reduced memory usage in AdaRound
- Fixed bugs in AdaRound for per-channel quantization
- Made ConnectedGraph more robust when identifying custom layers
- Added jupyter notebook-based examples for the following features
- AutoQuant: Added support for sparse conv layers in QuantSim (experimental)
Keras
- Added support for Keras per-channel quantization
- Changed interface to CLE to accept a pre-compiled model
- Added jupyter notebook-based examples for the following features: Transformer quantization
TensorFlow
- Fix to avoid unnecessary indexing in AdaRound
Common
- TF-enhanced calibration scheme has been accelerated using a custom CUDA kernel. Runs significantly faster now.
- Installation instructions are now combined with rest of the documentation (User-Guide and API docs)

1.22.2¶

Tensorflow
- Added support for supergroups : MatMul + Add
- Added support for TF-Slim BN name with backslash
- Added support for Depthwise + Conv in CLS

1.22.1¶

PyTorch
- Added support for QuantizableMultiHeadAttention for PyTorch nn.transformer layers
- Support functional conv2d in model preparer
- Enable qat with multi gpu
- Optimize forward pass logic of PyTorch QAT 2.0
- Fix functional depthwise conv support on model preparer
- Fix bug in model validator to correctly identify functional ops in leaf module
- Support dynamic functional conv2d in model preparer
- Added updated default runtime config, also a per-channel one.
- Include residing module info in model validator
Keras
- Support for Keras MultiHeadAttention Layer

1.22.0¶

PyTorch
- Support for simulation and QAT for PyTorch transformer models (including support for torch.nn mha and encoder layers)

1.21.0¶

PyTorch
- PyTorch QuantAnalyzer - Visualize per-layer sensitivity and per-quantizer PDF histograms
- PyTorch QAT with Range Learning: Added support for Per Channel Quantization
- PyTorch: Enabled exporting of encodings for multi-output leaf module
TensorFlow
- - New feature: TensorFlow AutoQuant - Automatically apply various AIMET post-training quantization techniques
- Adaround: Added ability to use configuration file in API to adapt to a specific runtime target
- Adaround: Added Per-Channel Quantization support
- TensorFlow QuantSim: Added support for FP16 inference and QAT
- TensorFlow Per Channel Quantization
  
  Fixed speed and accuracy issues
  
  Fixed zero accuracy for 16-bits per channel quantization
  
  Added support for DepthWise Conv2d Op
- Multiple other bug fixes

1.20.0¶

PyTorch
- Propagated encodings for ONNX Ops that were expanded from a single PyTorch Op
TensorFlow
- Upgraded AIMET to support TensorFlow version 2.4. AIMET remains compatible with TensorFlow version 1.15
Common
- Added Jupyter Notebooks for Examples
- Multiple bug fixes
- Removed version pinning of many dependent software packages

1.19.1¶

PyTorch
- Added CLE support for Conv1d, ConvTranspose1d and Depthwise Separable Conv1d layers
- Added High-Bias Fold support for Conv1D layer
- Modified Elementwise Concat Op to support any number of tensors
- Minor dependency fixes

1.18.0¶

Common
- Multiple bug fixes
- Additional feature examples for PyTorch and TensorFlow

1.17.0¶

TensorFlow
- Add Adaround TF feature
PyTorch
- Added Examples for Torch quantization, and Channel Pruning & Spatial SVD compression

1.16.2¶

PyTorch
- Added a new post-training quantization feature called AdaRound, which stands for AdaptiveRounding
- Quantization simulation and QAT now also support recurrent layers (RNN, LSTM, GRU)

1.16.1¶

Added separate packages for CPU and GPU models. This allows users with CPU-only hosts to run AIMET.
Added separate packages for PyTorch and TensorFlow. Reduces the number of dependencies that users would need to install.

1.16.0¶

Ported AIMET PyTorch to work with PyTorch ver 1.7.1 with CUDA 11.0
AIMET PyTorch and AIMET TensorFlow are now available as separate packages
Version of the AIMET PyTorch and AIMET TensorFlow packages for CPU-only machines are now available

1.13.0¶

PyTorch
- Added Adaptive Rounding feature (AdaRound) for PyTorch.
- Various bug fixes.