Kubernetes¶
Machine learning applications built for the Cloud AI 100 accelerator can be containerized with Docker and deployed with Kubernetes. The following figure shows a sample Kubernetes deployment.
K8s device plugin¶
The Cloud AI 100 k8s device plugin can be found at qaic-apps-1.x.y.z/common/tools/k8s-device-plugin in the Cloud AI 100 Apps SDK. The qaic-k8s-device-plugin is composed of the following tree structure.
├── Apache_License
├── build_image.sh
├── deploy-qaic-single.yaml
├── docker
│ ├── aarch64
│ │ └── Dockerfile.ubuntu
│ └── x86_64
│ └── Dockerfile.ubuntu
├── examples
│ └── pod-example.yml
├── go.mod
├── Gopkg.toml
├── go.sum
├── main.go
├── multi_soc_checks.go
├── multi_soc_checks_test.go
├── Notice.txt
├── qaic-device-plugin.yml
├── qaic.go
├── README.md
├── server.go
├── server_test.go
├── topology.go
├── topology_test.go
└── watcher.go
Contents of the qaic-k8s-device-plugin package:¶
- QAic K8s Device Plugin
- Sends the kubelet the list of AI 100 devices it manages.
- Monitors AI 100 device health.
- Handles AI 100 device allocation and cleanup.
- Qaic K8s Device Plugin Docker image build script
- Deployment scripts (YAML)
- Device Plugin Deployment Script (deploys Qaic K8s Device Plugin as daemonset)
- Sample AI 100 Workload Deployment Script
Feature:¶
How to allocate Cloud AI 100 resources:¶
Allocation can be done either using qaic
or based on qaic-<sku>
(std | pro | ultra)
qaic
setting doesn't look for what type of SKU is present, it just allocates the available resources.qaic-<sku>
setting will help to allocate resources based on SKU.
In the qaic-device-plugin.yml
file, we set
or not set
this flag QAIC_SKU_BASED_RESOURCE_ENABLED
for qaic-<sku>
or qaic
resources.
Example:
In the deploy-qaic-single.yaml
file, user would specify the devices that's supported, like qaic | qaic-std | qaic-pro | qaic-ultra
.
Example:
Prerequisites for deployment:
- Platform SDK installed on Kubernetes Worker Node
- Required for QAic Linux kernel drivers and firmware images.
- QAic K8s Device Plugin Docker Image available through customer docker-hub or preloaded on Kubernetes Worker Node
- AI 100 Workload Docker Image available through customer docker-hub or preloaded on Kubernetes Worker Node