Skip to content

User guide

Starting AICM

Install Dependencies

AICM files are located in:

/opt/qti-aic/tools/aic-manager

AICM requires several Python packages to run.
We recommend installing the dependencies in a virtual environment:

python -m venv <path-to-the-virtual-environment>
source <path-to-the-virtual-environment>/bin/activate
pip install -r requirements.txt

Start Agent

After activating the virtual environment, AICM can be started using the following command:

python aicm_agent.py

optional arguments:
  -h, --help                      Show this help message and exit
  --config_file                   Path to the Config file.
  --dump-default-config,          Dump the default config to the specified folder
    --dump_default_config_users   and exit.
  --dump-default-users,           Dump the default users file to the specified
    --dump_default_users          folder and exit.
  --ip                            IP address to bind to
  --port                          Port number to listen on
  --log                           Path where to store logs
  --max-log-size, --max_log_size  Max size of logs in bytes
  --ssl-key, --ssl_key            Path to the SSL key file. Needs to be provided
                                  for HTTPS.
  --ssl-cert, --ssl_cert          Path to the SSL certificate file. Needs to be
                                  provided for HTTPS
  --qmonitor-ip,                  IP of QMonitor GRPC Server
    --qmonitor_ip                 provided for HTTPS.
  --qmonitor-port,                Port of QMonitor GRPC Server
    --qmonitor_port
  -u, --users                     Full path to the users credentials file (.yaml)
  -v, --verbose                   Increase output verbosity

These settings can also be supplied via a configuration file.
If both are found, then the command line arguments will take priority.
The configuration file can be supplied via the --config_file option.

# AICM Configuration

# IP to bind AICM
ip = 127.0.0.1

# Port to bind AICM
port = 9000

# Full path to users credentials file
# users =

# SSL Key path
# ssl_key =

# SSL Certificate path
# ssl_cert =

# Path to directory where to store logs
# log =

# IP of QMonitor GRPC Server
qmonitor_ip = localhost

# Port of QMonitor GRPC Server
qmonitor_port = 62472

# Max sizes of logs in bytes
max_log_size = 100000000

# Verbosity of AICM agent
verbose = 0

AICM can also be run in a service-like manner :

sudo bash scripts/start_aicm_agent.sh

Once running, the APIs can be tested at <ip>:<port>/docs through the SwaggerUI.

Stop Agent

The following command will stop the agent when scripts/start_aicm_agent.sh was used to start it:

sudo bash scripts/stop_aicm_agent.sh
Alternatively, pressing Ctrl+C will stop AICM if it's running using the python aicm_agent.py command.

Setup Basic Auth

Since Basic Auth is used as the authentication method, users will need to authenticate all requests to our API.

The accepted credentials are stored in the .users.yaml file.

You can add/modify the credentials using the following syntax:

credentials:
  - username: admin
    hash: $2b$12$rEQTKF4IVHKPyeX6miseJ.xOjhmI5OFqlLuwE2OB4CuEIvHC2IFP6
    note: "Example of credential"
These credentials will be needed in every request made to the HTTP Rest Endpoints. For security purposes the password is hashed using bcrypt. A script used to get the hash is provided at /scripts/hash_password.py
Replace <password> and run this command:
python ./scripts/hash_password.py <password>

HTTPS

Basic Auth is just a simple mechanism for authentication. For added security, running HTTPS is recommended, which requires users to provide a certificate and key upon startup. This can be done by passing the following args:

  --ssl-key SSL_KEY    Path to the SSL key file. Needs to be provided for
                       HTTPS
  --ssl-cert SSL_CERT  Path to the SSL certificate file. Needs to be provided
                       for HTTPS

Metrics

These are currently the metrics served by AICM (stored in docs/metrics.csv): In addition to a set of core metrics, AICM provides Reliability, Availability, Serviceability (RAS) error statuses.

Model Field Name Description
HealthDataModel dev_status Status of the device
HealthDataModel mhi_id MHI ID
HealthDataModel pci_address PCI Address of the device
HealthDataModel pci_info PCI Info
HealthDataModel max_link_speed Max Link Speed
HealthDataModel max_link_width Max Link Width
HealthDataModel current_link_speed Current Link Speed
HealthDataModel current_link_width Current Link Width
HealthDataModel dev_link Dev Link Name
HealthDataModel hw_version Hardware version
HealthDataModel hw_serial_string HW Serial Number
HealthDataModel fw_version Firmware version
HealthDataModel fw_qc_image_version Qualcomm firmware identification string
HealthDataModel fw_oem_image_version OEM custom firmware identification string
HealthDataModel fw_image_variant Firmware image variant, e.g. debug, release, etc
HealthDataModel device_capabilities Device Firmware Features
HealthDataModel current_boot_interface Boot Interface
HealthDataModel nsp_version NSP version
HealthDataModel nsp_qc_image_version NSP Image string
HealthDataModel nsp_oem_image_version Image string provided by OEM
HealthDataModel nsp_image_variant NSP image variant, e.g. debug, release
HealthDataModel dram_total_kb Total RAM in system in KB
HealthDataModel dram_free_kb Amount of RAM free in KB
HealthDataModel dram_fragmentation_percentage Percentage of DRAM fragmentation
HealthDataModel vc_total Total number of virtual channels on the system
HealthDataModel vc_free Number of available virtual channels
HealthDataModel pc_total Total number of Physical Channels
HealthDataModel pc_reserved Number of reserved Physical Channels
HealthDataModel nsp_total Number of neural processors on the system
HealthDataModel nsp_free Number of available neural processors
HealthDataModel dram_bw_KBps DRAM bandwidth in Kbytes/second, averaged over last ~100 ms
HealthDataModel mcid_total Total number of multicast IDs available on the system
HealthDataModel mcid_free Number of available multicast IDs
HealthDataModel semaphore_total Total number of semaphores available on the system
HealthDataModel semaphore_free Number of available semaphores
HealthDataModel num_constant_loaded Number of constants loaded, each load of constants increments by 1
HealthDataModel num_constant_in_use Number of loaded constants that are actively used by networks running on the system
HealthDataModel num_networks_loaded Number of neural networks loaded in memory on the system
HealthDataModel num_networks_active Number of neural networks currently actively computing on the system
HealthDataModel neural_processor_frequency_Mhz Nominal operating frequency of the neural processors, all processors are having the same max clock
HealthDataModel ddr_frequency_Mhz Nominal operating frequency of DDR memory
HealthDataModel compute_noc_frequency_Mhz Nominal operating frequency of compute network on chip
HealthDataModel memory_noc_frequency_Mhz Nominal operating frequency of memory network on chip
HealthDataModel system_noc_frequency_Mhz Nominal operating frequency of system network on chip
HealthDataModel metadata_version Metadata version
HealthDataModel nnc_protocol_version NNC protocol version
HealthDataModel sbl_image SBL image string
HealthDataModel pvs_image_version PVS image version
HealthDataModel nsp_defective_pg_mask Defective NSP mask
HealthDataModel num_retired_ddr_pages Number of retired ddr pages
HealthDataModel need_reset_to_retire_pages Reset required to retire pending pages
HealthDataModel board_serial Board serial
HealthDataModel soc_temparature_degree_C SOC temperature in Degree Celsius
HealthDataModel board_power_watts Board power in Watts
HealthDataModel tdp_cap_watts Thermal Design Power cap in Watts
HealthDataModel sku_type SKU Type
HealthDataModel complex_id Complex ID
HealthDataModel soc_power_watts SOC Power in Watts
HealthDataModel soc_tdp_cap_watts SOC Thermal Design Power cap in Watts
PciDataModel byte_count_rx Bytes received on PCIE
PciDataModel byte_count_tx Bytes sent on PCIE
DdrBwDataModel byte_count_total Sum of the NSP individual byte count
DdrBwDataModel_NspDdrBwDataModel byte_count DDR Byte Count for this NSP
RasErrorsDataModel ras_ddr_correctable_error_count Count of Correctable Errors received from ras_ddr
RasErrorsDataModel ras_ddr_uncorrectable_error_count Count of Uncorrectable Errors received from ras_ddr
RasErrorsDataModel ras_mcw_correctable_error_count Count of Correctable Errors received from ras_mcw
RasErrorsDataModel ras_mcw_uncorrectable_error_count Count of Uncorrectable Errors received from ras_mcw
RasErrorsDataModel ras_imem_correctable_error_count Count of Correctable Errors received from ras_imem
RasErrorsDataModel ras_imem_uncorrectable_error_count Count of Uncorrectable Errors received from ras_imem
RasErrorsDataModel ras_nsp_correctable_error_count Count of Correctable Errors received from ras_nsp
RasErrorsDataModel ras_nsp_uncorrectable_error_count Count of Uncorrectable Errors received from ras_nsp