Introduction¶

Cloud AI SDKs enable developers to optimize trained deep learning models for high-performance inference. The SDKs provide workflows to optimize the models for best performance, provides runtime for execution and supports integration with ONNX Runtime and Triton Inference Server for deployment.

Cloud AI SDKs support

High performance Generative AI, Natural Language Processing, and Computer Vision models
Optimizing performance of the models per application requirements (throughput, accuracy and latency) through various quantization techniques
Development of inference applications through support for multiple OS and Docker containers.
Deployment of inference applications at scale with Triton inference server, vLLM and Text Generation Inference .

Cloud AI SDKs¶

The Cloud AI SDK consists of the Application (Apps) SDK and Platform SDK.

The Application (Apps) SDK is used to convert models and prepare runtime binaries for Cloud AI platforms. It contains model development tools, a sophisticated parallelizing graph compiler, performance and integration tools, and code samples. Apps SDK also consists an additional variant of Apps SDK, Qualcomm® AI Engine Direct which is a common SDK that is used across all other Qualcomm AI products.

The Platform SDK provides driver support for Cloud AI accelerators, APIs and tools for executing and debugging model binaries, and tools for card health, monitoring and telemetry. Platform SDK consists of a kernel driver, userspace runtime with APIs and language bindings, and card firmware.

Installation¶

The installation guide covers

Platforms, operating systems and hypervisors supported and corresponding prerequisites
Cloud AI SDK (Platform and Apps SDK) installation
Docker support
Inference servers

Inference Workflow¶

Inference workflow details the Cloud AI SDK workflow and tool support - from onboarding a pre-trained model to deployment on Cloud AI platforms.

Release Notes¶

Cloud AI release notes provide developers with new features, limitations and modifications in the Platform and Apps SDKs.

SDK Tools¶

SDK Tools provides details on usage of the tools in the SDKs used in the Inference workflow as well as card management.

Tutorials¶

Tutorials, in the form of Jupyter Notebooks walk the developer through the Cloud AI inference workflow as well as the tools used in the process. Tutorials are divided into CV and NLP to provide a better developer experience even though the inference workflows are quite similar.

Model Recipes¶

Model recipes provide the developer the most performant and efficient way to run some of the popular models across categories. The recipe starts with the public model. The model is then exported to ONNX, some patches are applied if required, compiled and executed for best performance. Developers can use the recipe to integrate the compiled binary into their inference application.

Sample Code¶

Sample code helps developers get familiar with the usage of Python and C++ APIs for inferencing on Cloud AI platforms.

System Management¶

System Management details management for Cloud AI Platforms.

Architecture¶

Architecture provides insights into the architecture of Cloud AI SoC and AI compute cores.