Qualcomm® Cloud AI
APIs
Initializing search
User Guide
API
FAQ
Blogs
Qualcomm® Cloud AI SDK User Guide
User Guide
User Guide
Introduction
Quick Start Guide
Installation
Installation
Checklist
Pre-
requisites
Cloud AI SDK
Hypervisors
Containers
Containers
Docker
Kubernetes
Triton Inference Server
v
LLM
AWS
Text Generation Inference
Inference Workflow
Inference Workflow
Export the Model
Export the Model
Exporting ONNX Model from Different Frameworks
Operator and Datatype support
Introduction to the Model Preparator Tool
Compile the Model
Compile the Model
Compile the Model
Tune Performance
Execute the QPC
Execute the QPC
Execute the QPC
Inference Profiling
Pytorch Workflow
Model Architecture Support
Model Architecture Support
Large Language Models
(LLMs)
Features
Features
Custom Operations
(C++)
Model Sharding
System Management
System Management
System Management
AIC-
manager
Architecture
QAIRT SDK
QAIRT SDK
QAIRT SDK Installation
QAIRT Software architecture
QAIRT Integration Workflow
QAIRT Model Compilation and execution
Tutorial
Precision support
Supported Operations
Addition of Custom-
op
QAIRT Tools
APIs
Glossary
API
API
Python API
Python API
qaic package
class Util
class Inference
Benchmark
Set
CPP API
CPP API
Inference
Set IO Example
Features
Runtime
ONNX Runtime
ONNX Runtime
QAIC execution provider
FAQ
FAQ
Frequently Asked Questions
Blogs
Blogs
Train anywhere, Infer on Qualcomm® Cloud AI 100
Quadruple LLM Decoding Performance with Speculative Decoding
(Sp
D) and Microscaling
(MX) Formats
Accelerate Inference of Fully Transparent Open-
Source LLMs from LLM360 on Qualcomm® Cloud AI 100 DL2q Instances
Extremely Low-
Cost Text Embeddings on Qualcomm® Cloud AI 100 DL2q Instances
Accelerate Large Language Model Inference by ~2x Using Microscaling
(Mx) Formats
APIs
¶
Refer to
api.html
Back to top