Features¶

Profiling Support in Runtime¶

When running inferences on AIC100, we might want to get deeper insights into the performance of the AIC100 stack to either triage low performance or for general monitoring.

Current AIC100 stack provides mechanism to get performance metrics of key milestones through the inference cycle.

Profiling data can be broadly classified into two components:

Host metrics An inference passes through various software layers on host before it make it to the network on device. Examining the performance on host we can identify if tweaking in network pre/post processing stages or host side multi-threading knobs is required.
Network/device metrics Network metrics provide plethora of information, starting from whole network execution time to detailed operator level performance. This information can be put to use to make the most out of existing network or to optimize network itself. Note: Network perf collection is not baked into the network by default. It needs to be enabled during network compilation itself. The amount of network perf details present depends on the parameters passed to AIC compiler.

Profiling Report Types¶

Using AIC100 software stacks, profiling information can be requested in following types (layouts):

Latency type¶

Latency type is a CSV style table of key of AIC100 stack. Contains both host and device side information.

Trace type¶

Trace type is a json formatted Chrome trace data. It can be viewed on any interface that consumes Chrome traces. Contains both host and device side information.

Profiling Report Collection methods¶

The above information can be requested from AIC100 stack using the following mechanism:

Num-iter based profiling¶

Alias: Legacy profiling

When user creates a profiling handle for num-iter based profiling, they need to specify:

The program to profile,
Number of samples to collect,
Profiling callback and,
Type of profiling output type i.e. latency or trace.

The fundamental idea is that during the creation of profiling handle, the user specifies the number of inferences that needs to be sampled. After profiling is started by the user, the profiling stops and calls user provided callback when:

Number of samples requested by user has been collected.
User explicitly stops profiling In this case, the number of samples collected might be less than that requested during profiling handle creation.

After the profiling is stopped, the user can again call start profiling using the same handle. The behavior of the infra will be as if the handle is being triggered for the first time.

Refer to section ProfilingHandle_ for HPP API interface.

Note in the APIs how at the creation of profiling handle, the user needs to be aware of the program needs to be profiled. Only 1 program can be profiled by a given profiling handle. If user wants to profile multiple programs, multiple profiling handles needs to be created, one for each program.