logo
Cloud AI
Search space parameters
Initializing search
    • User Guide
    • API
    • FAQ
    • Blogs
      • Getting started
      • Quick Start Guide
      • Installation
        • Checklist
        • Prerequisites
        • Download SDKs
        • SDK Installation
          • Platform SDK
          • Platform SDK Upgrade from Version < 1.19
          • Apps SDK
        • Verification
      • SDK Tools
        • SDK Version
          • qaic-version-util
        • Card health, Resources, and Logs
          • qaic-util
          • qaic-log
          • qaic-monitor-json
          • QAicUdevMonitor
          • qaic-trace-ctl
        • Model Compilation
          • qaic-compile
            • Examples:
              • Note:
                • Network specialization
                • Single device partitioning
                • MDP Load Partition Configuration
                • Custom I/O
                • Get intermediate layer outputs
                • Mixed precision
          • qaic-qpc
        • Model Execution
          • Network execution
            • QAic runner
              • qaic-runner argument details:
                • --aic-batch-json-input JSON Format
        • Oversubscription
          • QAic oversubscription
      • Inference Workflow
        • Export the Model
          • Exporting ONNX Model from Different Frameworks
          • Operator and Datatype support
          • Introduction to the Model Preparator Tool
        • Compile the Model
          • Compile the Model
          • Tune Performance
          • Model Configurator
            • Usage
            • Supported model types
            • Search space parameters
              • Brute force
              • Config optimizer
            • Search space parameters
              • Brute force
              • Config optimizer
            • Optimized search results
            • Input file options
            • Quantization options
            • Other options
            • Debug issues
            • Example config files
            • Example run and img2raw usage for a ResNet-50 model
        • Execute the QPC
          • Execute the QPC
          • Inference Profiling
          • ONNX Runtime
            • Building QAic Execution Provider
              • Building QAic Execution Provider
            • Building Docker Image
              • Building Docker Image
            • Model Settings Example
              • Model Setting Details
            • Code Samples
              • Python Sample
              • C++ Sample
              • End-to-end examples
      • Model Serving
        • vLLM
          • Run vLLM
          • Benchmarking
          • Supported Features
            • Model Coverage
              • vLLM supported models
              • Embedding Networks
            • Execution, Memory, and Context Management
              • Compute Context Length (CCL)
              • Prefix Caching
            • Decoding, Sampling, and Output Control
              • Speculative Decoding
              • On Device Sampling Support with vLLM
              • Guided Decoding
              • vLLM Supports Tool Call Parsing
            • Model Optimization and Adaptation
              • Quantization
              • LoRAX Support with vLLM
            • Serving Architecture and Deployment
              • Disaggregated Serving Support with vLLM
              • Multimodality Support with vLLM
              • vLLM Deployment using Kserve
            • Configuration, Compatibility, and Reference
              • vLLM arguments for QAIC
              • Cross‑Feature Support Matrix
          • Build vLLM (Optional)
            • Build for x86_64
            • Build for AArch64
        • Triton Inference Server
          • Docker Image
            • Build Triton Image with QAic Backends
          • Backends
            • QAic Backend
            • ONNX Runtime Backend
            • Python Backend for LLMs
            • Python Backend for Embeddings
            • vLLM Backend
          • Examples
            • Stable Diffusion
        • Text Generation Inference (TGI)
      • Deployment
        • Docker
          • Containers by Workflow
            • LLM Workflows with vLLM
            • Disaggregated LLM Serving
            • CV Model Workflows
            • Production Serving and Management
          • Custom Cloud AI Docker Images
            • Build a Custom Image
            • Launch Container
            • Test Container
        • Kubernetes
        • Hypervisors
        • AWS
      • Pytorch Workflow
      • Model Architecture Support
        • Large Language Models (LLMs)
      • Features
        • Custom Operations (C++)
        • Model Sharding
        • Object Detection Postprocessing
          • QAic Smart NMS
            • Installation
            • Usage
              • qaic-smart-nms
              • libAICsmartnms.so
              • smartnms Python package
            • Structure of user-config file
              • Model architecture information
              • I/O information
              • Thresholds information
            • Configuration files provided with the SDK
            • Limitations
          • QAic QDetect layers
            • Model preparation
              • Using QDetect with pre-exported ONNX model
              • Using QDetect with model source code
            • Compilation using qaic-compile
      • System Management
        • System Management
        • AIC-manager
          • Starting AICM
          • Metrics
          • Feature Overview
          • AICMI (AICM CLI)
          • Examples & How To
      • Architecture
      • Glossary
      • Python API
        • qaic package
        • class Util
        • class InferenceSet
      • CPP API
        • Examples
          • InferenceSet IO Example
          • QAicInferenceSet Example
        • Features
        • Runtime
      • Frequently Asked Questions
      • Train anywhere, Infer on Qualcomm® Cloud AI 100
      • Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats
      • Accelerate Inference of Fully Transparent Open-Source LLMs from LLM360 on Qualcomm® Cloud AI 100 DL2q Instances
      • Extremely Low-Cost Text Embeddings on Qualcomm® Cloud AI 100 DL2q Instances
      • Accelerate Large Language Model Inference by ~2x Using Microscaling (Mx) Formats

    Search space parameters¶

    • Brute force
    • Config optimizer
    Previous
    Supported model types
    Next
    Brute force
    © Copyright 2025 Qualcomm Innovation Center, Inc.
    Created using Sphinx 8.1.3. and Sphinx-Immaterial