Command Line Interface Use (CLI)


Use bash terminal, else if using ZSH terminal then device_groupshould be in single quotes e.g. '--device_group [0]'

  1. Check if compiled qpc for given config already exists, if it does jump to execute, else

  2. Check if exported ONNX file already exists, if true, jump to compilation -> execution, else

  3. Check if HF model exists in cache, if true, start transform -> export -> compilation -> execution, else,

4. Download HF model -> transform -> export -> compile -> execute Mandatory Args:

model_name (str):

Hugging Face Model Card name, Example: gpt2

num_cores (int):

Number of cores to compile model on.

Optional Args:
device_group (List[int]):

Device Ids to be used for compilation. If len(device_group) > 1, multiple Card setup is enabled. Defaults to None.

prompt (str):

Sample prompt for the model text generation. Defaults to None.

prompts_txt_file_path (str):

Path to txt file for multiple input prompts. Defaults to None.

aic_enable_depth_first (bool):

Enables DFS with default memory size. Defaults to False.

mos (int):

Effort level to reduce the on-chip memory. Defaults to 1.

batch_size (int):

Batch size to compile the model for. Defaults to 1.

full_batch_size (int):

Set full batch size to enable continuous batching mode. Default to None

prompt_len (int):

Prompt length for the model to compile. Defaults to 32.

ctx_len (int):

Maximum context length to compile the model. Defaults to 128.

generation_len (int):

Number of tokens to be generated. Defaults to False.

mxfp6 (bool):

Enable compilation for MXFP6 precision. Defaults to False.

mxint8 (bool):

Compress Present/Past KV to MXINT8 using CustomIO config. Defaults to False.

local_model_dir (str):

Path to custom model weights and config files. Defaults to None.

cache_dir (str):

Cache dir where downloaded HuggingFace files are stored. Defaults to None.

hf_token (str):

HuggingFace login token to access private repos. Defaults to None.

allow_mxint8_mdp_io (bool):

Allows MXINT8 compression of MDP IO traffic. Defaults to False.

enable_qnn (bool):

Enables QNN Compilation. Defaults to False.

qnn_config (str):

Path of QNN Config parameters file. Defaults to None.

python -m OPTIONS

Helper function used by execute CLI app to run the Model on Cloud AI 100 Platform.

Mandatory Args:
model_name (str):

Hugging Face Model Card name, Example: gpt2.

qpc_path (str):

Path to the generated binary after compilation.

Optional Args:
device_group (List[int]):

Device Ids to be used for compilation. if len(device_group) > 1. Multiple Card setup is enabled.``Defaults to None.``

local_model_dir (str):

Path to custom model weights and config files. Defaults to None.

prompt (str):

Sample prompt for the model text generation. Defaults to None.

prompts_txt_file_path (str):

Path to txt file for multiple input prompts. Defaults to None.

generation_len (int):

Number of tokens to be generated. Defaults to None.

cache_dir (str):

Cache dir where downloaded HuggingFace files are stored. Defaults to Constants.CACHE_DIR.

hf_token (str):

HuggingFace login token to access private repos. Defaults to None.

full_batch_size (int):

Set full batch size to enable continuous batching mode. Defaults to None.

python -m OPTIONS

Compiles the given ONNX model using Cloud AI 100 platform SDK compiler and saves the compiled qpc package at qpc_path. Generates tensor-slicing configuration if multiple devices are passed in device_group.

This function will be deprecated soon and will be replaced by QEFFAutoModelForCausalLM.compile.

Mandatory Args:
onnx_path (str):

Generated ONNX Model Path.

qpc_path (str):

Path for saving compiled qpc binaries.

num_cores (int):

Number of cores to compile the model on.

Optional Args:
device_group (List[int]):

Used for finding the number of devices to compile for. Defaults to None.

aic_enable_depth_first (bool):

Enables DFS with default memory size. Defaults to False.

mos (int):

Effort level to reduce the on-chip memory. Defaults to -1.

batch_size (int):

Batch size to compile the model for. Defaults to 1.

full_batch_size (int):

Set full batch size to enable continuous batching mode. Default to None

prompt_len (int):

Prompt length for the model to compile. Defaults to 32

ctx_len (int):

Maximum context length to compile the model. Defaults to 128

mxfp6 (bool):

Enable compilation for MXFP6 precision. Defaults to True.

mxint8 (bool):

Compress Present/Past KV to MXINT8 using CustomIO config. Defaults to False.

custom_io_file_path (str):

Path to customIO file (formatted as a string). Defaults to None.

allow_mxint8_mdp_io (bool):

Allows MXINT8 compression of MDP IO traffic Defaults to False.

enable_qnn (bool):

Enables QNN Compilation. Defaults to False.

qnn_config (str):

Path of QNN Config parameters file. Defaults to None.


Path to compiled qpc package.

python -m OPTIONS

Helper function used by export CLI app for exporting to ONNX Model.

Mandatory Args:
model_name (str):

Hugging Face Model Card name, Example: gpt2.

Optional Args:
cache_dir (str):

Cache dir where downloaded HuggingFace files are stored. Defaults to None.

hf_token (str):

HuggingFace login token to access private repos. Defaults to None.

local_model_dir (str):

Path to custom model weights and config files. Defaults to None.

full_batch_size (int):

Set full batch size to enable continuous batching mode. Defaults to None.

python -m OPTIONS

Helper function to finetune the model on QAic.

python -m OPTIONS