Docker¶
Docker enables users to build, test, and deploy applications using lightweight software containers. For Qualcomm® Cloud AI, Docker provides pre-built inference images that bundle the Platform SDK, Apps SDK, libraries, and essential system tools.
This integrated environment supports the end-to-end inference workflow, from model compilation to execution and serving, without requiring manual setup or complex dependency management.
System Prerequisites¶
Packages: Python 3.10 and Docker v23 or later. See Docker Installation.
Optional: Follow the Docker post-installation steps to run
dockeras a non-root user. Otherwise, thedockercommand andbuild_image.pyscripts must be prefaced withsudo.
Cloud AI Inference Containers¶
Qualcomm provides a set of pre-built Cloud AI Inference Docker images that simplify model compilation, inference, and deployment on Qualcomm® Cloud AI accelerators. These images are published in the packages section of the Cloud AI Containers repository.
Depending on the image, they support LLM workflows using vLLM, or advanced serving scenarios such as disaggregated serving, computer vision workflows, Kubernetes integration, and Triton Inference Server deployments.
Some images provide an interactive shell for flexible development, while others expose a preconfigured entrypoint (for example, vLLM or qaic-disagg) to enable out-of-the-box inference and serving.
Available Images¶
The following table summarizes the available Cloud AI Inference container images and their intended use cases.
Image Name |
Description |
Entrypoint |
|---|---|---|
|
Ubuntu 22.04-based image for compiling and executing models on Qualcomm Cloud AI using Apps and Platform SDKs. |
|
|
Ubuntu 24.04-based general inference image with Apps SDK, Platform SDK, and Python-based tools. |
|
|
Ubuntu 24.04-based image with a preconfigured vLLM entrypoint for OpenAI-compatible LLM serving. |
|
|
Similar to |
|
|
vLLM inference image based on Python 3.12, suitable for gpt-oss models. |
|
|
vLLM-based image with qaic-disagg entrypoint for prefill/decode disaggregated serving. |
|
|
Disaggregated serving image based on vLLM v0.8.5. |
|
|
Disaggregated serving image with Python 3.12 vLLM environment. |
|
|
General-purpose inference image based on Red Hat UBI9. Supports model compilation and execution on Qualcomm Cloud AI with Apps and Platform SDKs. |
|
|
UBI9-based image with vLLM TGIS adapter installed. Suitable for LLM inference workflows using vLLM. |
|
|
Similar to |
|
|
UBI9-based image with Python 3.12 vLLM environment, suitable for gpt-oss workloads. |
|
|
Image for compiling and running computer vision models on Qualcomm Cloud AI. Includes Apps SDK, Platform SDK, and Python-based tools. Focused on CV workflows rather than QEfficient or vLLM. |
|
|
Kubernetes device plugin enabling Qualcomm Cloud AI accelerators to be exposed to containerized workloads. |
|
|
Image running the AIC Manager (AICM) application for device management. |
|
|
Image running QMonitor for monitoring and diagnostics. |
|
|
Inference image that also includes the Triton Inference Server for serving models using Triton. |
|
Choose an Image¶
Qualcomm provides multiple Cloud AI Inference container images, each optimized for specific workflows such as LLM inference with vLLM, computer vision model development, and production-grade serving. Selecting the appropriate image depends on the type of model you are working with and how you plan to deploy it.
This section helps you:
Choose the right image for your workflow
Understand the differences between LLM, CV, and disaggregated serving workflows
Get started quickly with example `docker run` commands
Recommended Images by Use Case¶
Use the decision matrix below to quickly identify the best container image for your use case.
Use Case |
Recommended Image |
Why This Image |
|---|---|---|
General experimentation (interactive) |
|
Full Apps + Platform SDKs with an interactive shell for flexible experimentation. |
LLM inference with vLLM |
|
Preconfigured vLLM entrypoint for OpenAI-compatible LLM serving. |
LLM inference (vLLM v0.8.5) |
|
Use when vLLM v0.8.5 is required (for feature or compatibility reasons). |
LLM inference with Python 3.12 |
|
Required for gpt-oss and other Python 3.12-based workflows. |
Disaggregated prefill/decode serving |
|
Preconfigured qaic-disagg entrypoint for distributed LLM serving. |
CV model compilation and inference |
|
Includes Python tools and SDKs for CV workflows; no vLLM or QEfficient dependencies. |
Kubernetes deployments |
|
Exposes Qualcomm Cloud AI accelerators to Kubernetes workloads. |
Triton-based serving |
|
Includes Triton Inference Server and ONNX Runtime QAic Execution Provider for production inference pipelines. |
Containers by Workflow¶
The following sections provide container setup examples for different inference workflows.
Custom Cloud AI Docker Images¶
Follow these instructions to build, launch and test custom Cloud AI Docker images.