Qualcomm® Cloud AI
CPP API
Initializing search
User Guide
API
FAQ
Blogs
Qualcomm® Cloud AI SDK User Guide
User Guide
User Guide
Introduction
Quick Start Guide
Installation
Installation
Checklist
Pre-
requisites
Cloud AI SDK
Hypervisors
Containers
Containers
Docker
Kubernetes
Triton Inference Server
v
LLM
AWS
Text Generation Inference
Inference Workflow
Inference Workflow
Export the Model
Export the Model
Exporting ONNX Model from Different Frameworks
Operator and Datatype support
Introduction to the Model Preparator Tool
Compile the Model
Compile the Model
Compile the Model
Tune Performance
Execute the QPC
Execute the QPC
Execute the QPC
Inference Profiling
Pytorch Workflow
Model Architecture Support
Model Architecture Support
Large Language Models
(LLMs)
Features
Features
Custom Operations
(C++)
Model Sharding
System Management
System Management
System Management
AIC-
manager
Architecture
Glossary
API
API
Python API
Python API
qaic package
class Util
class Inference
Benchmark
Set
CPP API
CPP API
Inference
Set IO Example
Features
Runtime
CPP API
CPP API
Table of contents
Inference
Set IO Example
Features
Runtime
ONNX Runtime
ONNX Runtime
QAIC execution provider
FAQ
FAQ
Frequently Asked Questions
Blogs
Blogs
Train anywhere, Infer on Qualcomm® Cloud AI 100
Quadruple LLM Decoding Performance with Speculative Decoding
(Sp
D) and Microscaling
(MX) Formats
Accelerate Inference of Fully Transparent Open-
Source LLMs from LLM360 on Qualcomm® Cloud AI 100 DL2q Instances
Extremely Low-
Cost Text Embeddings on Qualcomm® Cloud AI 100 DL2q Instances
Accelerate Large Language Model Inference by ~2x Using Microscaling
(Mx) Formats
CPP API
¶
C++ API reference for Qualcomm® Cloud AI:
InferenceSet IO Example
Features
Runtime
Back to top