Kubernetes

AI applications built for the Cloud AI inference accelerator can be containerized with Docker (see Docker) and deployed with Kubernetes.

The following figure shows a sample Kubernetes deployment.

../../../_images/Kubernetes_cluster.png

Tip

Use the cloud_ai_inference_ubuntu24 Docker image to quickly containerize your Cloud AI workload.

QAic K8s Device Plugin

The QAic K8s device plugin exposes Cloud AI device resources to containerized AI applications. The plugin supports:

  • Reporting the number of QAic devices on each cluster node

  • Monitoring QAic device health status

  • QAic device allocation and cleanup

Download the plugin from the Cloud AI Containers repository:

docker pull ghcr.io/quic/cloud_ai_k8s_device_plugin:1.21.2.0

Package Contents

The plugin source is available in the Cloud AI Apps SDK installer: qaic-apps-1.x.y.z/common/tools/k8s-device-plugin.

  • Plugin source

  • Docker image build script

  • Deployment scripts (YAML)

    • Device Plugin Deployment Script (deploys Qaic K8s Device Plugin as DaemonSet)

    • Sample Cloud AI Workload Deployment Script

Device Plugin Configuration

Refer to qaic-device-plugin.yml in the Cloud AI Apps SDK installer: qaic-apps-1.x.y.z/common/tools/k8s-device-plugin for a sample QAic K8s device plugin configuration. The following plugin options are available:

  • Allocate by Card Type (QAIC_SKU_BASED_RESOURCE_ENABLED)

  • Fractional allocation (QAIC_FRACTIONAL_DEVICE_STRATEGY)

Allocate by Card Type

Allocation can be done using either qaic or qaic-<sku> (std | pro | ultra)

  • qaic setting doesn’t look for what type of card SKU is present, it just allocates the available resources.

  • qaic-<sku> setting will help to allocate resources based on card SKU.

In the qaic-device-plugin.yml file, set or unset the QAIC_SKU_BASED_RESOURCE_ENABLED flag to enable qaic-<sku> or qaic resources.

Example:

env:
   - name: QAIC_SKU_BASED_RESOURCE_ENABLED
   value: "1"
securityContext:
   privileged: true
volumeMounts:
   - name: device-plugin

In the deploy-qaic-single.yaml file, specify the supported device resource, for example: qaic, qaic-std, qaic-pro, or qaic-ultra.

Example for qaic-ultra card SKU:

spec:
containers:
- name: qaic
   image: ghcr.io/quic/cloud_ai_inference_ubuntu24:1.21.2.0
   imagePullPolicy: Never
   command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
   securityContext:
      runAsUser: 1000
      runAsGroup: 995   # Local machine group qaic
   resources:
      limits:
      qualcomm.com/qaic-ultra: 1

Fractional Allocation

Fractional allocation of the device resources is supported with the QAIC_FRACTIONAL_DEVICE_STRATEGY flag.

Allocate half of device resources:

env:
   - name: QAIC_FRACTIONAL_DEVICE_STRATEGY
   value: "Half"

Allocate a quarter of device resources:

env:
   - name: QAIC_FRACTIONAL_DEVICE_STRATEGY
   value: "Quarter"

Deployment Configuration

A Kubernetes deployment is defined using a YAML configuration file. The following example defines a deployment for a single Cloud AI device.

deploy-qaic-single.yaml:

apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: qaic-deployment
namespace: kube-system
spec:
selector:
   matchLabels:
      app: qaic
replicas: 1 # tells deployment to run 1 pods matching the template
template:
   metadata:
      labels:
      app: qaic
   spec:
      containers:
      - name: qaic
      image: ghcr.io/quic/cloud_ai_inference_ubuntu24:1.21.2.0
      imagePullPolicy: Never
      command: [ "/bin/bash", "-ce", "tail -f /dev/null" ]
      securityContext:
         runAsUser: 1000
         runAsGroup: 995   # Local machine group qaic
      resources:
         limits:
            qualcomm.com/qaic: 1

Prerequisites for Deployment

  • Platform SDK installed on the Kubernetes worker node (required for QAic Linux kernel drivers and firmware images.)

  • Cloud AI devices available on the worker node

  • QAic K8s Device Plugin Docker image available through a local Docker registry or preloaded on the Kubernetes worker node

  • Cloud AI workload Docker image available through a local Docker registry or preloaded on the Kubernetes worker node

Examples