Cloud AI

Search space parameters

Initializing search

User Guide
API
FAQ
Blogs

Qualcomm Cloud AI Documentation

User Guide
User Guide
- Getting started
- Quick Start Guide
- Installation
  Installation
  - Checklist
  - Prerequisites
  - Download SDKs
  - SDK Installation
    
    SDK Installation
    
    Platform SDK
    
    Platform SDK Upgrade from Version < 1.19
    
    Apps SDK
  - Verification
- SDK Tools
  SDK Tools
  - SDK Version
    
    SDK Version
    
    qaic-version-util
  - Card health, Resources, and Logs
    
    Card health, Resources, and Logs
    
    qaic-util
    
    qaic-log
    
    qaic-monitor-json
    
    QAicUdevMonitor
    
    qaic-trace-ctl
  - Model Compilation
    
    Model Compilation
    
    qaic-compile
    
    qaic-compile
    
    Examples:
    
    Examples:
    
    Note:
    
    Note:
    
    Network specialization
    
    Single device partitioning
    
    MDP Load Partition Configuration
    
    Custom I/O
    
    Get intermediate layer outputs
    
    Mixed precision
    
    qaic-qpc
  - Model Execution
    
    Model Execution
    
    Network execution
    
    Network execution
    
    QAic runner
    
    QAic runner
    
    qaic-runner argument details:
    
    qaic-runner argument details:
    
    --aic-batch-json-input JSON Format
  - Oversubscription
    
    Oversubscription
    
    QAic oversubscription
- Inference Workflow
  Inference Workflow
  - Export the Model
    
    Export the Model
    
    Exporting ONNX Model from Different Frameworks
    
    Operator and Datatype support
    
    Introduction to the Model Preparator Tool
  - Compile the Model
    
    Compile the Model
    
    Compile the Model
    
    Tune Performance
    
    Model Configurator
    
    Model Configurator
    
    Usage
    
    Supported model types
    
    Search space parameters
    
    Search space parameters
    
    Brute force
    
    Config optimizer
    
    Search space parameters Search space parameters
    Table of contents
    
    Brute force
    
    Config optimizer
    
    Optimized search results
    
    Input file options
    
    Quantization options
    
    Other options
    
    Debug issues
    
    Example config files
    
    Example run and img2raw usage for a ResNet-50 model
  - Execute the QPC
    
    Execute the QPC
    
    Execute the QPC
    
    Inference Profiling
    
    ONNX Runtime
    
    ONNX Runtime
    
    Building QAic Execution Provider
    
    Building QAic Execution Provider
    
    Building QAic Execution Provider
    
    Building Docker Image
    
    Building Docker Image
    
    Building Docker Image
    
    Model Settings Example
    
    Model Settings Example
    
    Model Setting Details
    
    Code Samples
    
    Code Samples
    
    Python Sample
    
    C++ Sample
    
    End-to-end examples
- Model Serving
  Model Serving
  - vLLM
    
    vLLM
    
    Run vLLM
    
    Benchmarking
    
    Supported Features
    
    Supported Features
    
    Model Coverage
    
    Model Coverage
    
    vLLM supported models
    
    Embedding Networks
    
    Execution, Memory, and Context Management
    
    Execution, Memory, and Context Management
    
    Compute Context Length (CCL)
    
    Prefix Caching
    
    Decoding, Sampling, and Output Control
    
    Decoding, Sampling, and Output Control
    
    Speculative Decoding
    
    On Device Sampling Support with vLLM
    
    Guided Decoding
    
    vLLM Supports Tool Call Parsing
    
    Model Optimization and Adaptation
    
    Model Optimization and Adaptation
    
    Quantization
    
    LoRAX Support with vLLM
    
    Serving Architecture and Deployment
    
    Serving Architecture and Deployment
    
    Disaggregated Serving Support with vLLM
    
    Multimodality Support with vLLM
    
    vLLM Deployment using Kserve
    
    Configuration, Compatibility, and Reference
    
    Configuration, Compatibility, and Reference
    
    vLLM arguments for QAIC
    
    Cross‑Feature Support Matrix
    
    Build vLLM (Optional)
    
    Build vLLM (Optional)
    
    Build for x86_64
    
    Build for AArch64
  - Triton Inference Server
    
    Triton Inference Server
    
    Docker Image
    
    Docker Image
    
    Build Triton Image with QAic Backends
    
    Backends
    
    Backends
    
    QAic Backend
    
    ONNX Runtime Backend
    
    Python Backend for LLMs
    
    Python Backend for Embeddings
    
    vLLM Backend
    
    Examples
    
    Examples
    
    Stable Diffusion
  - Text Generation Inference (TGI)
- Deployment
  Deployment
  - Docker
    
    Docker
    
    Containers by Workflow
    
    Containers by Workflow
    
    LLM Workflows with vLLM
    
    Disaggregated LLM Serving
    
    CV Model Workflows
    
    Production Serving and Management
    
    Custom Cloud AI Docker Images
    
    Custom Cloud AI Docker Images
    
    Build a Custom Image
    
    Launch Container
    
    Test Container
  - Kubernetes
  - Hypervisors
  - AWS
- Pytorch Workflow
- Model Architecture Support
  Model Architecture Support
  - Large Language Models (LLMs)
- Features
  Features
  - Custom Operations (C++)
  - Model Sharding
  - Object Detection Postprocessing
    
    Object Detection Postprocessing
    
    QAic Smart NMS
    
    QAic Smart NMS
    
    Installation
    
    Usage
    
    Usage
    
    qaic-smart-nms
    
    libAICsmartnms.so
    
    smartnms Python package
    
    Structure of user-config file
    
    Structure of user-config file
    
    Model architecture information
    
    I/O information
    
    Thresholds information
    
    Configuration files provided with the SDK
    
    Limitations
    
    QAic QDetect layers
    
    QAic QDetect layers
    
    Model preparation
    
    Model preparation
    
    Using QDetect with pre-exported ONNX model
    
    Using QDetect with model source code
    
    Compilation using qaic-compile
- System Management
  System Management
  - System Management
  - AIC-manager
    
    AIC-manager
    
    Starting AICM
    
    Metrics
    
    Feature Overview
    
    AICMI (AICM CLI)
    
    Examples & How To
- Architecture
- Glossary
API
API
- Python API
  Python API
- CPP API
  CPP API
  - Examples
    
    Examples
    
    InferenceSet IO Example
    
    QAicInferenceSet Example
  - Features
  - Runtime
FAQ
FAQ
- Frequently Asked Questions
Blogs
Blogs

Search space parameters¶

Brute force
Config optimizer

Created using Sphinx 8.1.3. and Sphinx-Immaterial