Cloud AI
Usage
Initializing search
User Guide
API
FAQ
Blogs
Qualcomm Cloud AI Documentation
User Guide
User Guide
Getting started
Quick Start Guide
Installation
Installation
Checklist
Prerequisites
Download SDKs
SDK Installation
SDK Installation
Platform SDK
Platform SDK Upgrade from Version < 1.
19
Apps SDK
Verification
SDK Tools
SDK Tools
SDK Version
SDK Version
qaic-
version-
util
Card health, Resources, and Logs
Card health, Resources, and Logs
qaic-
util
qaic-
log
qaic-
monitor-
json
QAic
Udev
Monitor
qaic-
trace-
ctl
Model Compilation
Model Compilation
qaic-
compile
qaic-
compile
Examples:
Examples:
Note:
Note:
Network specialization
Single device partitioning
MDP Load Partition Configuration
Custom I/O
Get intermediate layer outputs
Mixed precision
qaic-
qpc
Model Execution
Model Execution
Network execution
Network execution
QAic runner
QAic runner
qaic-
runner argument details:
qaic-
runner argument details:
--
aic-
batch-
json-
input JSON Format
Oversubscription
Oversubscription
QAic oversubscription
Inference Workflow
Inference Workflow
Export the Model
Export the Model
Exporting ONNX Model from Different Frameworks
Operator and Datatype support
Introduction to the Model Preparator Tool
Compile the Model
Compile the Model
Compile the Model
Tune Performance
Model Configurator
Model Configurator
Usage
Supported model types
Search space parameters
Search space parameters
Brute force
Config optimizer
Optimized search results
Input file options
Quantization options
Other options
Debug issues
Example config files
Example run and img2raw usage for a Res
Net-
50 model
Execute the QPC
Execute the QPC
Execute the QPC
Inference Profiling
ONNX Runtime
ONNX Runtime
Building QAic Execution Provider
Building QAic Execution Provider
Building QAic Execution Provider
Building Docker Image
Building Docker Image
Building Docker Image
Model Settings Example
Model Settings Example
Model Setting Details
Code Samples
Code Samples
Python Sample
C++ Sample
End-
to-
end examples
Model Serving
Model Serving
v
LLM
v
LLM
Run v
LLM
Benchmarking
Supported Features
Supported Features
Model Coverage
Model Coverage
v
LLM supported models
Embedding Networks
Execution, Memory, and Context Management
Execution, Memory, and Context Management
Compute Context Length
(CCL)
Prefix Caching
Decoding, Sampling, and Output Control
Decoding, Sampling, and Output Control
Speculative Decoding
On Device Sampling Support with v
LLM
Guided Decoding
v
LLM Supports Tool Call Parsing
Model Optimization and Adaptation
Model Optimization and Adaptation
Quantization
Lo
RAX Support with v
LLM
Serving Architecture and Deployment
Serving Architecture and Deployment
Disaggregated Serving Support with v
LLM
Multimodality Support with v
LLM
v
LLM Deployment using Kserve
Configuration, Compatibility, and Reference
Configuration, Compatibility, and Reference
v
LLM arguments for QAIC
Cross‑Feature Support Matrix
Build v
LLM
(Optional)
Build v
LLM
(Optional)
Build for x86_
64
Build for AArch64
Triton Inference Server
Triton Inference Server
Docker Image
Docker Image
Build Triton Image with QAic Backends
Backends
Backends
QAic Backend
ONNX Runtime Backend
Python Backend for LLMs
Python Backend for Embeddings
v
LLM Backend
Examples
Examples
Stable Diffusion
Text Generation Inference
(TGI)
Deployment
Deployment
Docker
Docker
Containers by Workflow
Containers by Workflow
LLM Workflows with v
LLM
Disaggregated LLM Serving
CV Model Workflows
Production Serving and Management
Custom Cloud AI Docker Images
Custom Cloud AI Docker Images
Build a Custom Image
Launch Container
Test Container
Kubernetes
Hypervisors
AWS
Pytorch Workflow
Model Architecture Support
Model Architecture Support
Large Language Models
(LLMs)
Features
Features
Custom Operations
(C++)
Model Sharding
Object Detection Postprocessing
Object Detection Postprocessing
QAic Smart NMS
QAic Smart NMS
Installation
Usage
Usage
qaic-
smart-
nms
lib
AICsmartnms.
so
smartnms Python package
Usage
Usage
Table of contents
qaic-
smart-
nms
lib
AICsmartnms.
so
smartnms Python package
Structure of user-
config file
Structure of user-
config file
Model architecture information
I/O information
Thresholds information
Configuration files provided with the SDK
Limitations
QAic QDetect layers
QAic QDetect layers
Model preparation
Model preparation
Using QDetect with pre-
exported ONNX model
Using QDetect with model source code
Compilation using qaic-
compile
System Management
System Management
System Management
AIC-
manager
AIC-
manager
Starting AICM
Metrics
Feature Overview
AICMI
(AICM CLI)
Examples & How To
Architecture
Glossary
API
API
Python API
Python API
qaic package
class Util
class Inference
Set
CPP API
CPP API
Examples
Examples
Inference
Set IO Example
QAic
Inference
Set Example
Features
Runtime
FAQ
FAQ
Frequently Asked Questions
Blogs
Blogs
Train anywhere, Infer on Qualcomm® Cloud AI 100
Quadruple LLM Decoding Performance with Speculative Decoding
(Sp
D) and Microscaling
(MX) Formats
Accelerate Inference of Fully Transparent Open-
Source LLMs from LLM360 on Qualcomm® Cloud AI 100 DL2q Instances
Extremely Low-
Cost Text Embeddings on Qualcomm® Cloud AI 100 DL2q Instances
Accelerate Large Language Model Inference by ~2x Using Microscaling
(Mx) Formats
Usage
¶
qaic-smart-nms
libAICsmartnms.so
smartnms Python package
Back to top