efficient-transformers

Getting Started

  • Introduction Qualcomm efficient-transformers library
  • Validated Models
  • Models Coming Soon

Installation

  • Pre-requisites
  • Linux Installation
  • Sanity Check

Quick start

  • Transformed models and QPC storage
  • Command Line Interface
  • Python API

Command Line Interface Use (CLI)

  • QEfficient.cloud.infer
  • QEfficient.cloud.execute
  • QEfficient.cloud.compile
  • QEfficient.cloud.export

Python API

  • High Level API
  • Low Level API

Blogs

  • Train anywhere, Infer on Qualcomm Cloud AI 100
  • How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
  • Power-efficient acceleration for large language models – Qualcomm Cloud AI SDK
  • Qualcomm Cloud AI 100 Accelerates Large Language Model Inference by ~2x Using Microscaling (Mx) Formats
  • Qualcomm Cloud AI Introduces Efficient Transformers: One API, Infinite Possibilities

Reference

  • Qualcomm Cloud AI home
  • Qualcomm Cloud AI SDK download
  • Qualcomm Cloud AI API reference
  • User Guide
  • OCP Microscaling Formats (MX) Specification
efficient-transformers
  • Welcome to Efficient-Transformers Documentation!
  • View page source

Welcome to Efficient-Transformers Documentation!

Getting Started

  • Introduction Qualcomm efficient-transformers library
  • Validated Models
  • Models Coming Soon

Installation

  • Pre-requisites
  • Linux Installation
    • Using SDK
    • Using GitHub Repository
  • Sanity Check

Quick start

  • Transformed models and QPC storage
  • Command Line Interface
    • QEfficient.cloud.infer
    • QEfficient.cloud.execute
    • Multi-Qranium Inference
    • Continuous Batching
  • Python API
    • 1. Model download and Optimize for Cloud AI 100
    • 2. Export and Compile with one API
    • 3. Execute

Command Line Interface Use (CLI)

  • QEfficient.cloud.infer
  • QEfficient.cloud.execute
  • QEfficient.cloud.compile
  • QEfficient.cloud.export

Python API

  • High Level API
    • QEFFAutoModelForCausalLM
    • QEffAutoPeftModelForCausalLM
    • export
    • compile
    • Execute
  • Low Level API
    • convert_to_cloud_kvstyle
    • convert_to_cloud_bertstyle
    • utils

Blogs

  • Train anywhere, Infer on Qualcomm Cloud AI 100
  • How to Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats on Qualcomm® Cloud AI 100
  • Power-efficient acceleration for large language models – Qualcomm Cloud AI SDK
  • Qualcomm Cloud AI 100 Accelerates Large Language Model Inference by ~2x Using Microscaling (Mx) Formats
  • Qualcomm Cloud AI Introduces Efficient Transformers: One API, Infinite Possibilities

Reference

  • Qualcomm Cloud AI home
  • Qualcomm Cloud AI SDK download
  • Qualcomm Cloud AI API reference
  • User Guide
  • OCP Microscaling Formats (MX) Specification
Next

© Copyright 2024, Qualcomm.

Built with Sphinx using a theme provided by Read the Docs.
Version: release/v1.18
Versions
main
release/v1.18
release/v1.19