efficient-transformers

Getting Started

  • Introduction Qualcomm efficient-transformers library
  • Validated Models
  • Models Coming Soon

Installation

  • Pre-requisites
  • Linux Installation
  • Sanity Check

Quick start

  • Transformed models and QPC storage
  • Command Line Interface
  • Python API

Command Line Interface Use (CLI)

  • QEfficient.cloud.infer
  • QEfficient.cloud.execute
  • QEfficient.cloud.compile
  • QEfficient.cloud.export

Python API

  • High Level API
  • Low Level API

Blogs

  • Train anywhere, Infer on Qualcomm Cloud AI 100
  • How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
  • Power-efficient acceleration for large language models – Qualcomm Cloud AI SDK
  • Qualcomm Cloud AI 100 Accelerates Large Language Model Inference by ~2x Using Microscaling (Mx) Formats
  • Qualcomm Cloud AI Introduces Efficient Transformers: One API, Infinite Possibilities

Reference

  • Qualcomm Cloud AI home
  • Qualcomm Cloud AI SDK download
  • Qualcomm Cloud AI API reference
  • User Guide
  • OCP Microscaling Formats (MX) Specification
efficient-transformers
  • Search


© Copyright 2024, Qualcomm.

Built with Sphinx using a theme provided by Read the Docs.
Version: Current version here
Versions
main
release/v1.18
release/v1.19