logo
Cloud AI
QAic Smart NMS
Initializing search
    • User Guide
    • API
    • FAQ
    • Blogs
      • Getting started
      • Quick Start Guide
      • Installation
        • Checklist
        • Prerequisites
        • Download SDKs
        • SDK Installation
          • Platform SDK
          • Platform SDK Upgrade from Version < 1.19
          • Apps SDK
        • Verification
      • SDK Tools
        • SDK Version
          • qaic-version-util
        • Card health, Resources, and Logs
          • qaic-util
          • qaic-log
          • qaic-monitor-json
          • QAicUdevMonitor
          • qaic-trace-ctl
        • Model Compilation
          • qaic-compile
            • Examples:
              • Note:
                • Network specialization
                • Single device partitioning
                • MDP Load Partition Configuration
                • Custom I/O
                • Get intermediate layer outputs
                • Mixed precision
          • qaic-qpc
        • Model Execution
          • Network execution
            • QAic runner
              • qaic-runner argument details:
                • --aic-batch-json-input JSON Format
        • Oversubscription
          • QAic oversubscription
      • Inference Workflow
        • Export the Model
          • Exporting ONNX Model from Different Frameworks
          • Operator and Datatype support
          • Introduction to the Model Preparator Tool
        • Compile the Model
          • Compile the Model
          • Tune Performance
          • Model Configurator
            • Usage
            • Supported model types
            • Search space parameters
              • Brute force
              • Config optimizer
            • Optimized search results
            • Input file options
            • Quantization options
            • Other options
            • Debug issues
            • Example config files
            • Example run and img2raw usage for a ResNet-50 model
        • Execute the QPC
          • Execute the QPC
          • Inference Profiling
          • ONNX Runtime
            • Building QAic Execution Provider
              • Building QAic Execution Provider
            • Building Docker Image
              • Building Docker Image
            • Model Settings Example
              • Model Setting Details
            • Code Samples
              • Python Sample
              • C++ Sample
              • End-to-end examples
      • Model Serving
        • vLLM
          • Run vLLM
          • Benchmarking
          • Supported Features
            • Model Coverage
              • vLLM supported models
              • Embedding Networks
            • Execution, Memory, and Context Management
              • Compute Context Length (CCL)
              • Prefix Caching
            • Decoding, Sampling, and Output Control
              • Speculative Decoding
              • On Device Sampling Support with vLLM
              • Guided Decoding
              • vLLM Supports Tool Call Parsing
            • Model Optimization and Adaptation
              • Quantization
              • LoRAX Support with vLLM
            • Serving Architecture and Deployment
              • Disaggregated Serving Support with vLLM
              • Multimodality Support with vLLM
              • vLLM Deployment using Kserve
            • Configuration, Compatibility, and Reference
              • vLLM arguments for QAIC
              • Cross‑Feature Support Matrix
          • Build vLLM (Optional)
            • Build for x86_64
            • Build for AArch64
        • Triton Inference Server
          • Docker Image
            • Build Triton Image with QAic Backends
          • Backends
            • QAic Backend
            • ONNX Runtime Backend
            • Python Backend for LLMs
            • Python Backend for Embeddings
            • vLLM Backend
          • Examples
            • Stable Diffusion
        • Text Generation Inference (TGI)
      • Deployment
        • Docker
          • Containers by Workflow
            • LLM Workflows with vLLM
            • Disaggregated LLM Serving
            • CV Model Workflows
            • Production Serving and Management
          • Custom Cloud AI Docker Images
            • Build a Custom Image
            • Launch Container
            • Test Container
        • Kubernetes
        • Hypervisors
        • AWS
      • Pytorch Workflow
      • Model Architecture Support
        • Large Language Models (LLMs)
      • Features
        • Custom Operations (C++)
        • Model Sharding
        • Object Detection Postprocessing
          • QAic Smart NMS
            • Installation
            • Usage
              • qaic-smart-nms
              • libAICsmartnms.so
              • smartnms Python package
            • Structure of user-config file
              • Model architecture information
              • I/O information
              • Thresholds information
            • Configuration files provided with the SDK
            • Limitations
          • QAic Smart NMS
            • Installation
            • Usage
              • qaic-smart-nms
              • libAICsmartnms.so
              • smartnms Python package
            • Structure of user-config file
              • Model architecture information
              • I/O information
              • Thresholds information
            • Configuration files provided with the SDK
            • Limitations
          • QAic QDetect layers
            • Model preparation
              • Using QDetect with pre-exported ONNX model
              • Using QDetect with model source code
            • Compilation using qaic-compile
      • System Management
        • System Management
        • AIC-manager
          • Starting AICM
          • Metrics
          • Feature Overview
          • AICMI (AICM CLI)
          • Examples & How To
      • Architecture
      • Glossary
      • Python API
        • qaic package
        • class Util
        • class InferenceSet
      • CPP API
        • Examples
          • InferenceSet IO Example
          • QAicInferenceSet Example
        • Features
        • Runtime
      • Frequently Asked Questions
      • Train anywhere, Infer on Qualcomm® Cloud AI 100
      • Quadruple LLM Decoding Performance with Speculative Decoding (SpD) and Microscaling (MX) Formats
      • Accelerate Inference of Fully Transparent Open-Source LLMs from LLM360 on Qualcomm® Cloud AI 100 DL2q Instances
      • Extremely Low-Cost Text Embeddings on Qualcomm® Cloud AI 100 DL2q Instances
      • Accelerate Large Language Model Inference by ~2x Using Microscaling (Mx) Formats

    QAic Smart NMS¶

    Smart NMS (non-max suppression) provides a way to run parts of the network on AI accelerators and other parts, on which better overall inference times can be achieved by leveraging parallelism across two devices. The object detection models can be partitioned to run the feature extractor part on the AI100, and to run the remaining box processing and NMS modules on the host.

    • Installation
    • Usage
    • Structure of user-config file
    • Configuration files provided with the SDK
    • Limitations
    Previous
    Object Detection Postprocessing
    Next
    Installation
    © Copyright 2025 Qualcomm Innovation Center, Inc.
    Created using Sphinx 8.1.3. and Sphinx-Immaterial