Introduction Qualcomm efficient-transformers
library
Train anywhere, Infer on Qualcomm Cloud AI with a Developer-centric Toolchain
This library provides reimplemented blocks of LLMs which are used to make the models functional and highly performant on Qualcomm Cloud AI 100. We support wide range of models architectures, for easy efficient deployment on Cloud AI 100 cards. Users only need to provide model card from HuggingFace or Path to the local model and the library will take care of transforming model to it’s efficient implementation for Cloud AI 100.
For other models, there is comprehensive documentation to inspire upon the changes needed and How-To(s).
Typically for LLMs, the library provides:
Reimplemented blocks from Transformers which enable efficient on-device retention of intermediate states. read more here
Graph transformations to enable execution of key operations in lower precision
Graph transformations to replace some operations to other mathematically equivalent operations that are efficient/supported on HW backend
Handling for underflow and overflows in lower precision
Patcher modules to map weights of original model’s operations to updated model’s operations
Exporter module to export the model source into a
ONNX
Graph.Sample example applications and demo notebooks
Unit test templates.
Latest news :
[coming soon] Support for more popular models and inference optimization technique speculative decoding
[09/2024] Now we support PEFT models
[09/2024] Added support for Meta-Llama-3.1-8B
[09/2024] Added support for Meta-Llama-3.1-8B-Instruct
[09/2024] Added support for Meta-Llama-3.1-70B-Instruct
[09/2024] Added support for granite-20b-code-base
[09/2024] Added support for granite-20b-code-instruct-8k
[09/2024] Added support for Starcoder1-15B
[08/2024] Added support for inference optimization technique
continuous batching
[08/2024] Added support for Jais-adapted-70b
[08/2024] Added support for Jais-adapted-13b-chat
[08/2024] Added support for Jais-adapted-7b
[06/2024] Added support for GPT-J-6B
[06/2024] Added support for Qwen2-1.5B-Instruct
[06/2024] Added support for StarCoder2-15B
[06/2024] Added support for Phi3-Mini-4K-Instruct
[06/2024] Added support for Codestral-22B-v0.1
[06/2024] Added support for Vicuna-v1.5
[05/2024] Added support for Mixtral-8x7B & Mistral-7B-Instruct-v0.1.
[04/2024] Initial release of efficient transformers for seamless inference on pre-trained LLMs.