Metrics

These are currently the metrics served by AICM (stored in docs/metrics.csv): In addition to a set of core metrics, AICM provides Reliability, Availability, Serviceability (RAS) error statuses.

Model

Field Name

Description

HealthDataModel

dev_status

Status of the device

HealthDataModel

mhi_id

MHI ID

HealthDataModel

pci_address

PCI Address of the device

HealthDataModel

pci_info

PCI Info

HealthDataModel

max_link_speed

Max Link Speed

HealthDataModel

max_link_width

Max Link Width

HealthDataModel

current_link_speed

Current Link Speed

HealthDataModel

current_link_width

Current Link Width

HealthDataModel

dev_link

Dev Link Name

HealthDataModel

hw_version

Hardware version

HealthDataModel

hw_serial_string

HW Serial Number

HealthDataModel

fw_version

Firmware version

HealthDataModel

fw_qc_image_version

Qualcomm firmware identification string

HealthDataModel

fw_oem_image_version

OEM custom firmware identification string

HealthDataModel

fw_image_variant

Firmware image variant, e.g. debug, release, etc

HealthDataModel

device_capabilities

Device Firmware Features

HealthDataModel

current_boot_interface

Boot Interface

HealthDataModel

nsp_version

NSP version

HealthDataModel

nsp_qc_image_version

NSP Image string

HealthDataModel

nsp_oem_image_version

Image string provided by OEM

HealthDataModel

nsp_image_variant

NSP image variant, e.g. debug, release

HealthDataModel

dram_total_kb

Total RAM in system in KB

HealthDataModel

dram_free_kb

Amount of RAM free in KB

HealthDataModel

dram_fragmentation_percentage

Percentage of DRAM fragmentation

HealthDataModel

vc_total

Total number of virtual channels on the system

HealthDataModel

vc_free

Number of available virtual channels

HealthDataModel

pc_total

Total number of Physical Channels

HealthDataModel

pc_reserved

Number of reserved Physical Channels

HealthDataModel

nsp_total

Number of neural processors on the system

HealthDataModel

nsp_free

Number of available neural processors

HealthDataModel

dram_bw_KBps

DRAM bandwidth in Kbytes/second, averaged over last ~100 ms

HealthDataModel

mcid_total

Total number of multicast IDs available on the system

HealthDataModel

mcid_free

Number of available multicast IDs

HealthDataModel

semaphore_total

Total number of semaphores available on the system

HealthDataModel

semaphore_free

Number of available semaphores

HealthDataModel

num_constant_loaded

Number of constants loaded, each load of constants increments by 1

HealthDataModel

num_constant_in_use

Number of loaded constants that are actively used by networks running on the system

HealthDataModel

num_networks_loaded

Number of neural networks loaded in memory on the system

HealthDataModel

num_networks_active

Number of neural networks currently actively computing on the system

HealthDataModel

neural_processor_frequency_Mhz

Nominal operating frequency of the neural processors, all processors are having the same max clock

HealthDataModel

ddr_frequency_Mhz

Nominal operating frequency of DDR memory

HealthDataModel

compute_noc_frequency_Mhz

Nominal operating frequency of compute network on chip

HealthDataModel

memory_noc_frequency_Mhz

Nominal operating frequency of memory network on chip

HealthDataModel

system_noc_frequency_Mhz

Nominal operating frequency of system network on chip

HealthDataModel

metadata_version

Metadata version

HealthDataModel

nnc_protocol_version

NNC protocol version

HealthDataModel

sbl_image

SBL image string

HealthDataModel

pvs_image_version

PVS image version

HealthDataModel

nsp_defective_pg_mask

Defective NSP mask

HealthDataModel

num_retired_ddr_pages

Number of retired ddr pages

HealthDataModel

need_reset_to_retire_pages

Reset required to retire pending pages

HealthDataModel

board_serial

Board serial

HealthDataModel

soc_temparature_degree_C

SOC temperature in Degree Celsius

HealthDataModel

board_power_watts

Board power in Watts

HealthDataModel

tdp_cap_watts

Thermal Design Power cap in Watts

HealthDataModel

sku_type

SKU Type

HealthDataModel

complex_id

Complex ID

HealthDataModel

soc_power_watts

SOC Power in Watts

HealthDataModel

soc_tdp_cap_watts

SOC Thermal Design Power cap in Watts

PciDataModel

byte_count_rx

Bytes received on PCIE

PciDataModel

byte_count_tx

Bytes sent on PCIE

DdrBwDataModel

byte_count_total

Sum of the NSP individual byte count

DdrBwDataModel_NspDdrBwDataModel

byte_count

DDR Byte Count for this NSP

RasErrorsDataModel

ras_ddr_correctable_error_count

Count of Correctable Errors received from ras_ddr

RasErrorsDataModel

ras_ddr_uncorrectable_error_count

Count of Uncorrectable Errors received from ras_ddr

RasErrorsDataModel

ras_mcw_correctable_error_count

Count of Correctable Errors received from ras_mcw

RasErrorsDataModel

ras_mcw_uncorrectable_error_count

Count of Uncorrectable Errors received from ras_mcw

RasErrorsDataModel

ras_imem_correctable_error_count

Count of Correctable Errors received from ras_imem

RasErrorsDataModel

ras_imem_uncorrectable_error_count

Count of Uncorrectable Errors received from ras_imem

RasErrorsDataModel

ras_nsp_correctable_error_count

Count of Correctable Errors received from ras_nsp

RasErrorsDataModel

ras_nsp_uncorrectable_error_count

Count of Uncorrectable Errors received from ras_nsp