Runtime
The following document describes Qualcomm AIC 100 User space Linux Runtime classes design and implementation.
QPC Elements¶
QPC¶
Class Qpc
is a main QPC API class type that provides functionality to allocate a QPC from a filename or buffer, and also provides API to query information related to the loaded QPC
.
Class Qpc
has 2 Factory functions to create a std::shared_ptr<>
of Qpc
. There are couple of ways to create QPC
object.
The class is non-copyable, nor movable.
- Creating from buffer and size.
- Creating from a given filename (base-path plus filename).
If Factory
instance creation is successful, the functions will return an instance of std::shared_ptr<>
, otherwise a proper exception will be thrown.
Important API in class type Qpc
includes the following:
getInfo()
- returnsQpcInfo
. See below for more info.getBufferMappings()
- returns a container ofBufferMappings
. EachBufferMapping
instance in the container includes information on the buffers as obtained fromQpc
file. Name, size, direction and index of buffers.getBufferMappingsDma
- Same return type as above (getBufferMapping()
) but for DMA allocated buffers instead of user's heap allocated ones.getIoDescriptor()
- returns pointer to QData which is a buffer and length of the QPC buffer.get()
- returns a const pointer to QAicQpcObj - which is an opaque Data structure to Internal Program Container. Users have no visibility to the API of QAicQpcObj since it is Opaque in the C layer.buffer()
- returns the QPC buffer as a const pointer to uint8_t.size()
- returns the size of the QPC buffer.
QpcFile¶
Class QpcFile
is encapsulating the QPC file basepath and filename as well as DataBuffer<QData>
data member which holds the buffer of the QPC file.
QpcFile
takes the base-path and the filename of the QPC, where programqpc.bin
is a default filename provided in the constructor.
QpcFile
is a non-movable, non-copyable type.
It has a load()
function that will load the content of the QPC into DataBuffer<>
which holds QData
buffer representation internally.
There are few APIs provided by the QpcFile
class type.
getBuffer()
- returns aQData
buffer const reference.data()
- returns aconst uint8_t
pointer to buffer.size()
- returns the size of the loaded QPC file buffer.
QpcInfo¶
Struct QpcInfo
is a simple struct type that aggregates a collection of QpcProgramInfo
(also referred to as "program") and corresponding collection of QpcConstantsInfo
(also referred to as "constants").
QpcProgramInfo¶
Struct QpcProgramInfo
is a simple struct aggregating information related to the content of the loaded QPC.
For example:
- BufferMappings, user allocated or DMA.
- Name identifying a segment in
QPC
file. - Number of cores requires to run the program.
- Program Index in the QPC.
- Size of the program
- Batch size.
- Number of Semaphores, number of MC(MultiCast) IDs.
- Total required memory to run the program.
QpcConstantsInfo¶
Struct QpcConstantsInfo
defines the Constants info that are obtained from the QPC file.
It has the following attributes:
name
index
size
BufferMappings¶
Vector BufferMappings
is a vector of BufferMapping
.
BufferMappings
is created from QPC.
BufferMappings
is used by API to store the Input/Output buffer information that is also used to create inferenceVector
.
BufferMapping¶
Struct BufferMapping
is a simple struct type that describes the information of a buffer.
Struct BufferMapping
has two constructors:
- Creating by providing all of the data members.
- Default constructor that creates an uninitialized
BufferMapping
instance.
Struct BufferMapping
has following structure data members:
bufferName
- string name identifying the buffer.index
- an unsigned int that represent the index in an array of buffers.ioType
- define the direction of a buffer from user's perspective. An input buffer is from user to device. An output buffer is from device to user.size
- buffer size in bytes.isPartialBufferAllowed
-Partial buffer
is a feature that allows buffer to have actual size that is smaller than what is specified in IO descriptor.isPartialBufferAllowed
is set by IO descriptor. By settingisPartialBufferAllowed
true, this buffer takes user buffer that is smaller than what is specified bysize
.dataType
- define the format of buffer. The types of format is defined in structQAicBufferDataTypeEnum
QAicBufferDataTypeEnum¶
Struct QAicBufferDataTypeEnum
is a simple struct type that defines the data type of the BufferMapping
.
Struct QAicBufferDataTypeEnum
defines following data types:
BUFFER_DATA_TYPE_FLOAT
- 32-bit float type (float)BUFFER_DATA_TYPE_FLOAT16
- 16-bit float type (half, fp16)BUFFER_DATA_TYPE_INT8Q
- 8-bit quantized type (int8_t)BUFFER_DATA_TYPE_UINT8Q
- unsigned 8-bit quantized type (uint8_t)BUFFER_DATA_TYPE_INT16Q
- 16-bit quantized type (int16_t)BUFFER_DATA_TYPE_INT32Q
- 32-bit quantized type (int32_t)BUFFER_DATA_TYPE_INT32I
- 32-bit index type (int32_t)BUFFER_DATA_TYPE_INT64I
- 64-bit index type (int64_t)BUFFER_DATA_TYPE_INT8
- 8-bit type (int8_t)BUFFER_DATA_TYPE_INVAL
- invalid type
Context Elements¶
Context¶
There are various Linux Runtime core components like qpc
, program
, execObj
, and queue
etc. which are needed to run inference and enhance performance/usability.
Class Context
is a primary class which helps to link all LRT core components. Context
object should be created first. Application creates a context to obtain access to other API functions, the context is passed in other API calls.
The caller can also register for logging and error callbacks. A context ID is passed to the error handler to uniquely identify the Context
object.
Class Context
has a Factory functions to create a std::shared_ptr<>
of Context
.
Context object is created from context properties, list of devices used by this context, logging callback function, specific user data to be included in log callback, an error handler to call in case of critical errors and specific user data to be included in error handler callback.
If logging callback and error handler are not provided then default defaultLogger
and defaultErrorHandler
will be used.
If Factory
instance creation is successful, the functions will return an instance of std::shared_ptr<>
, otherwise a proper exception will be thrown.
Important API in class type Context
includes the following:
findDevice()
- returns a suitable device for the network and check selected device has enough resources.setLogLevel()
- set new logging level to get logging information while running the program. See below for more details aboutQLogLevel
.getLogLevel()
- returns current logging level for givenContext
.get()
- returns a const pointer toQAicContext
. Users have no visibility to the API ofQAicContext
since it is Opaque in the C layer.getId()
- returns an unsigned int that represent id of theContext
. This id will be returned in error reports to refer a specific created context.objName()
- returnsconst std::string
which is name of the object. For context object name isContext
.objNameCstr()
- returns a pointer to an array that contains a null-terminated sequence of char representing the name of an object.
QLogLevel¶
There are diffrent type of logging level to see different kind of logs.
QL_DEBUG
: set to this level to see debug logsQL_INFO
: set to this level to see informative logsQL_WARN
: set to this level to see warning logsQL_ERROR
: set to this level to see error logs
LogCallback
- It is a logging callback lambda function.
ErrorHandler
- It is an error handler lambda function to call in case of critical errors.
Device Log capture¶
When a Context
is created with QAIC_CONTEXT_COLLECT_DEVICE_LOGS property bitmask set,
it configures devices to start streaming logs to host. Logs are passed to user application
using QAicLogCallback, which is registered while creating context. There is time delay
between actual log event in device and host receiving them. Due to this Host logs are
buffered for few seconds and dumped chronologically along with device logs.
Context object keeps a track of all programs that are associated with it. Logs from all devices and NSPs, used within a context object, are captured. It is done in a separate thread, and will not impact the data path flow.
Prerequisite¶
qaic-monitor-grpc-server should be running in background. User can use below commands.
- systemd-run --unit=qmonitor-proxy /opt/qti-aic/tools/qaic-monitor-grpc-server # starts in background
- systemctl stop qmonitor-proxy # stops background service
Limitation¶
QSM logs from different inference sessions cannot be filtered. If multiple inference sessions are running in a device, QSM Logs captured by a context will have logs from all of them.
Profiling Elements¶
For overview of profiling feature refer to Profiling Support in Runtime.
ProfilingHandle¶
ProfilingHandle
provides interface to use num-iter based profiling. Refer to Num-iter based profiling for more details on num-iter based profiling feature.
A ProfilingHandle
object should be created using the Factory
method. User needs to specify the Program
that should be profiled, number of samples to collect, callback to call to deliver report, and type of profiling output expected.
Note
Profiling type parameter has a default value set to Latency type.
Important API in class type ProfilingHandle
includes the following:
start()
- Start profiling. After the API call, profiling data from all the inferences for specifiedProgram
will be collected till either user callsstop()
or number of requested samples have been collected.stop()
Stop profiling. Stops profiling even if the num-samples requirement has not been met. This API calls triggers a callback to the user specified callback with profiling report of all collected samples.
Note
If stop()
is called without any inferences being complete for the specified Program
, callback will not get triggered.
StreamProfilingHandle¶
ProfilingHandle provides interface to use duration based profiling. Refer to Duration based profiling for more details on duration based profiling feature.
A StreamProfilingHandle
object should be created using the Factory
method. User needs to specify the sampling rate, reporting rate, callback to call to deliver report and profiling output format expected. User may optionally specify a name for the handle, and regEx to auto add/remove programs.
Note
Profiling type is specified using ProfilingProperties field. ProfilingProperties has a default param nullptr which results in profiling type to be Latency type.
Important API in class type StreamProfilingHandle
includes the following:
-
start()
- Start profiling. After the API call, user should expect a callback at every reporting rate boundary containing information of profiling inferences during that duration.Note
User will get callback even if there are no samples collected.
-
stop()
Stop profiling. A final report will be delivered to user when the profiling is stopped with the profiling data of samples collected from last report till the pointstop()
is called. -
addProgram()
- Add a program to list of program being profiled.Note
Adding program when profiling is active can cause spurious report callback or a delayed report callback.
-
removeProgram()
- Remove a program from list of program being profiled.Note
Removing program when profiling is active can cause spurious report callback or a delayed report callback.
-
flushReports()
- After profiling is stopped usingstop()
API, user should make sure that all the reports on the queue of the profiling infrastructure are delivered to user as callback before application exit.flushReports()
API only returns after there are no more reports left to be delivered to the user, thus ensuring a clean application exit.
Inferencing Elements¶
QBuffer¶
QBuffer
is a struct that contains pointer to the buffer and its size. It can have Input or output buffer address from heap or DMA memory. handle
, offset
and type
are considered only when type is QBUFFER_TYPE_DMABUF or QBUFFER_TYPE_PMEM. It has following Members:
size
- Total size of memory pointed by buf pointer or handle.buf
- Buffer Pointer, must be valid in case of heap buffer.handle
- Buffer Handle, must be valid in case of DMA (or PMEM) buffer.offset
- Offset within handle.type
- Type of the buffer heap, DMA or PMEM.
InferenceVector¶
InferenceVector
contains a vector of QBuffer
. Vector of QBuffer
is a vector containing both input and output buffers.
User can create InferenceVector
from multiple sources like files from disk or create QBuffer
with data available with the user and set them in InferenceVector
with setBuffers()
API of this class.
The input buffer will be used for inference and result of inference will be stored in output buffer.
User needs to keep reference of InferenceVector until inference is complete and user can read output buffers from inference vector after completion of inference.
InferenceVector
APIs
getVector()
: Returns a vector ofQBuffer
. Vector contains both input and output buffers.QBuffer
is a struct that contains pointer to the buffer and its size. User can call this after the inference is completed to read output buffers.setBuffers()
: Sets input and output buffers of thisInferenceVector
Factory()
: Instantiates InferenceVector
InferenceHandle¶
InferenceHandle
contains InferenceVector
and id given at the time of submission of inference. InferenceHandle
cannot be created directly by the user, user can get an
available InferenceHandle
by calling getAvailable()
API of InferenceSet
.
InferenceHandle
is a container that has data needed for inference stored in InferenceVector
. Number of InferenceHandle
objects created depends on the set_size and
num_activations parameters passed during instantiation of InferenceSet
. Number of InferenceHandle
and number of ExecObj
created will be the same.
LifeCycle of InferenceHandle
¶
InferenceHandle
objects are created whenInferenceSet
is instantiated and all objects are moved to availableList vector from which user can retrieve it by callinggetAvailable()
API ofInferenceSet
- When user calls
getAvailable()
if availableList vector has anInferenceHandle
, it is popped out from the availableList and returned to user, otherwise this call is blocked until the user puts the usedInferenceHandle
usingputCompleted()
API - User sets buffers in the InferenceHandle it got using
setBuffers()
API - User submits InferenceHandle using
submit()
API ofInferenceSet
- To get the completed
InferenceHandle
user can callgetCompleted()
orgetCompletedId()
and extract/read the output of inference fromInferenceHandle
- After processing the output of inference, user needs to call
putCompleted()
API ofInferenceSet
to put completedInferenceHandle
back to availableList vector otherwisegetAvailable()
call will be blocked
InferenceSet¶
InferenceSet
is a C++ class that is used to submit inference. It abstracts out lower level classes like Queue, Program and ExecObj and provides an easier way
of handling multiple activations in a single group to submit inference.
List of APIs of InferenceSet
submit()
: Submit an inference throughInferenceVector
. The submission will be blocked until anInferenceHandle
is availablesubmit()
: Submit an inference throughInferenceHandle
getCompleted()
: Returns a completedInferenceHandle
object. User can access output of the inference using thisInferenceHandle
object by callinggetBuffers()
methodgetCompletedId()
: Returns a completedInferenceHandle
object with specified IDputCompleted()
: Move a completed InferenceHandle back into the availableList vectorgetAvailable()
: Returns an availableInferenceHandle
object from availableListwaitForCompletion()
: Wait for all inferences submitted to be completed on all activationsFactory()
: Instantiates InferenceSet
NumActivations and SetSize¶
NumActivations
and SetSize
are arguments of InferenceSet::Factory
API.
NumActivations
:InferenceSet
creates this many numbers of network instances inside device. User can decideNumActivations
based on number of cores required to run his network and number of available cores.SetSize
: For each network instance, user application can simultaneously enqueue this many numbers of input/output buffers to run inferences. Recommended value is between 2 to 10. User should find an optimal value to achieve desired throughput (inferences/sec) and latency.
Inference Flow¶
Inference flow using InferenceSet
would be as follows:
- Instantiate
InferenceSet
using the Factory method - Acquire one of the available
InferenceHandle
. - Set Input and Output buffers in that
InferenceHandle
. - Submit
InferenceHandle
toInferenceSet
to run inference in device. - Call
getCompletedId
API to wait for Inference to complete. Inference results are available in Output buffers, once this API returns. - Once application reads output data,
putCompleted
must be called to return theInferenceHandle
back to available List of handles.
InferenceSetProperties¶
InferenceSetProperties
defines properties to be consumed by InferenceSet
List of members of InferenceSetProperties
programProperties
: User can set different program properties which will be consumed internally byProgram
object. Notable programProperties are:SubmitRetryTimeoutMs
: After submission of inference, runtime waits for this milliseconds timeout period, if inference is not complete in this timeout period, error is returnedSubmitNumRetries
: Number of times submission should be retried when the above timeout occursdevMapping
: devMapping specifies the physical devices to be used by the program and is valid only for the network's that need multiple devices to run
queueProperties
: User can set queue properties which will be consumed internally byQueue
object. Notable queueProperties are:numThreadsPerQueue
: Number of threads spawned to process elements in the queue. Default 4
name
: Defines name of the InferenceSet Objectid
: Defines id of the InferenceSet Object