class InferenceSet¶
Help on class InferenceSet in module qaicrt:
InferenceSet(Logger)¶
InferenceSet(context, qpc, devId, setSize, numActivations, properties, enableProfiling) -> inferenceSet
InferenceSet(context, qpc, devId, setSize, numActivations, infDataVectorDmaBuf, properties, enableProfiling) -> inferenceSet
Defines a set of Activations that are scheduled as a single group to submit inferences.
The inference flow consists of:
A user application thread submitting inferences through the
submitAPI.A user application thread calling
getCompletedIdto retrieve completed inferences.The completed inferences are returned through
InferenceHandle(shInferenceHandle).The
InferenceHandlecontains theExecObjand the ID given at submission time, such that the thread can correlate submission to completions.Note: due to high-performance multi-threading, in-order completion is not guaranteed — inferences may complete out of order.
Once the data is read and processed,
putCompletedmust be called to return theInferenceHandleback to the processing queue.
The InferenceSet may be constructed for USER buffers:
USER Input/Output Buffer Type: User data is copied into DMA buffers at each inference.
Parameters
Methods defined here:
__init__¶
__init__(*args, **kwargs)
Overloaded function.
1. __init__(self: qaicrt.InferenceSet, context: qaicrt.Context, qpc: qaicrt.Qpc,
devId: Optional[int] = None, setSize: int, numActivations: int,
properties: qaicrt.InferenceSetProperties = None,
enableProfiling: bool = False) -> None
2. __init__(self: qaicrt.InferenceSet, context: qaicrt.Context, qpc: qaicrt.Qpc,
devId: int, setSize: int, numActivations: int,
infDataVectorDmaBuf: list[list[qaicrt.QBuffer]],
properties: qaicrt.InferenceSetProperties = None,
enableProfiling: bool = False) -> None
getAvailable¶
getAvailable(self: qaicrt.InferenceSet, timeoutUs: int = 0) -> tuple[qaicrt.QStatus, qaic::rt::InferenceHandle]
Description
Retrieve an available InferenceHandle. This call will block until an available
InferenceHandle is ready for use. The InferenceHandle will include the ExecObj
and the ID provided in submission.
Parameters
Parameter |
Description |
|---|---|
|
[optional] If an |
Returns
Tuple of infHandle and operational status.
infHandle: The availableInferenceHandle.Operational status:
qaicrt.QStatus.QS_SUCCESSSuccessful completion.qaicrt.QStatus.QS_TIMEDOUTTimed out before acquiring available inference handle.
getCompletedId¶
getCompletedId(self: qaicrt.InferenceSet, id: int, timeoutUs: int = 0) -> tuple[qaicrt.QStatus, qaic::rt::InferenceHandle]
Description
Retrieve a specific completed inference by ID. This call will block until a completed
InferenceHandle is available. The InferenceHandle will include the ExecObj
and the ID provided in submission.
Parameters
Parameter |
Description |
|---|---|
|
The ID of the inference to retrieve. The ID does not need to be unique — inference results are stored in a multi-map hash table. If multiple inferences are submitted with the same ID, this method will retrieve any of the completed inferences with that ID but will not guarantee in-order completion. If the caller requires in-order completion, unique IDs should be provided for each submission. |
|
[optional] If an |
Returns
Tuple of infHandle and operational status.
infHandle: The completedInferenceHandle.Operational status:
qaicrt.QStatus.QS_SUCCESSSuccessful completion.qaicrt.QStatus.QS_TIMEDOUTTimed out before acquiring completed inference handle with given ID.
getInferenceHandle¶
getInferenceHandle(self: qaicrt.InferenceSet, qbufferDma: qaicrt.QBuffer) -> tuple[qaicrt.QStatus, qaic::rt::InferenceHandle]
Description
Find an InferenceHandle associated with a previously provided input/output DMABuf in
a DMA InferenceSet factory. DMABufs passed in inferenceDataVectorDmaBuf are
distributed among all the InferenceHandle s owned by InferenceSet. Use this API
to find the InferenceHandle linked with a specific DMABuf and submit inferences using it.
Parameters
Parameter |
Description |
|---|---|
|
One of the DMABufs that was previously passed in the DMA
|
Returns
Tuple of infHandle and operational status.
infHandle: TheInferenceHandleassociated withqbufferDma.Operational status:
qaicrt.QStatus.QS_SUCCESSInferenceHandlefound associated with DMABuf.qaicrt.QStatus.QS_ERRORInferenceHandlenot found.qaicrt.QStatus.QS_INVALInvalid buffer info passed inqbufferDma.
putCompleted¶
putCompleted(self: qaicrt.InferenceSet, arg0: qaic::rt::InferenceHandle) -> qaicrt.QStatus
Description
Release a completed InferenceHandle back into the queue for processing.
Returns
qaicrt.QStatus.QS_SUCCESSSuccessful completion.
submit¶
submit(self: qaicrt.InferenceSet, infHandle: qaic::rt::InferenceHandle,
id: int = 0) -> qaicrt.QStatus
Description
Submit an inference through an InferenceHandle obtained from getAvailable.
Parameters
Parameter |
Description |
|---|---|
|
An |
|
[optional] User-defined ID for the inference. This will be returned
in the |
Returns
qaicrt.QStatus.QS_SUCCESSSuccessful completion.qaicrt.QStatus.QS_INVALInvalid paraminfHandle.qaicrt.QStatus.QS_ERRORFailed to submit inference due to internal error.
waitForCompletion¶
waitForCompletion(self: qaicrt.InferenceSet, timeoutUs: int = 0) -> qaicrt.QStatus
Description
Wait for all submitted inferences to be completed on all activations. This is a convenience API to ensure that all pending inferences are completed; the results are discarded. The total time waited will depend on the number of pending inferences and the number of activations.
Parameters
Returns
qaicrt.QStatus.QS_SUCCESSSuccessful completion.qaicrt.QStatus.QS_TIMEDOUTTimed out before all submitted inferences are completed.