Custom I/O¶
The custom I/O feature allows a user to provide the desired layout and precision for the inputs and outputs while loading a network. Instead of compiling the network for the inputs and outputs specified in the model, the network is compiled for the inputs and outputs described in the custom configuration. This feature is used when the user intends to preprocess (on GPU/CDSP or any other method) or offline process (like allowed by ML commons) the input data and avoid some steps in the input processing. A user can avoid redundant transposes, data-type conversions if they have knowledge of the input preprocessing steps. Similarly, on the postprocessing side, if the model output is to be fed to a next stage in a pipeline, the desired format and type can be configured as the output of current stage.
In case of layout, the user can choose either ‘NHWC’ or ‘NCHW’ if the rank of the tensor is four.
In case of precision, the user can choose either ‘float’, ‘float16’, or ‘int8’ datatypes. For the ‘int8’ datatype, quantization parameters should also be provided.
This feature is compatible with other features such as mixed precision, external quantization, and quantization profile generation. In this section, the term “model I/O” refers to the input and output datatypes and formats of the original model. The term “custom I/O” refers to the input and output datatypes and formats desired by the user.
Custom I/O configuration file¶
YAML schema
Custom I/O can be applied using a configuration YAML file that contains the following fields for each input and output that needs to be modified:
IOName: Name of the input or output that needs to be loaded as per the custom requirement.
Layout: NCHW and NHWC layouts are supported. This field is optional and can be skipped for an input or output if rank is not equal to four or if layout customization is not required.
Precision: float, float16, and int8 datatypes are supported. This field is optional and can be skipped for an input or output if datatype customization is not required.
Scale: scale value. This field is mandatory if the ‘Precision’ is specified as ‘int8’. This field is optional for other datatypes, and it will be ignored even if provided.
Offset: offset value. This field is mandatory if the ‘Precision’ is specified as ‘int8’. This field is optional for other datatypes, and it will be ignored even if provided.
Custom I/O configuration example
Consider an ONNX model with three inputs and three outputs and with the original model I/O and custom I/O configuration as shown in the following tables.
Custom I/O configuration: Inputs
I/O Name
Model I/O
Custom I/O
input_0
float NCHW
int8 NHWC
input_1
float NCHW
float NHWC
input_2
int64 NCHW
int64 NHWC
Custom I/O configuration: Outputs
I/O Name
Model I/O
Custom I/O
output_0
float NCHW
float64 NHWC
output_1
float (rank != 4)
float64 (No layout change)
output_2
int64 (rank !=4)
No change
Then, the content of the custom I/O configuration YAML file that should be provided is:
- IOName: input_0
Layout: NHWC
Precision: int8
Scale: 0.12
Offset: 3
- IOName: input_1
Layout: NHWC
- IOName: input_2
Layout: NHWC
- IOName: output_0
Layout: NHWC
Precision: float16
- IOName: output_1
Precision: float16
Notes:
If no change is required for an input or output, it can be skipped in the configuration file.
This feature currently does not support layouts other than NCHW and NHWC. For other layouts, the ‘
Layout’ field should be skipped in the configuration file.Precision can be modified using the custom I/O feature only if the model input or output datatype is float, float16, or int8. For other datatypes, the ‘
Precision’ field should be skipped in the configuration file.For the int8 datatype, quantization parameters must be provided. They can be obtained for a desired activations quantization schema using the min-max quantization method. In case of SQNR or KL divergence minimization or percentile calibration methods, then the min and max values will be different than that of the float32 inputs.
Usage with QAic compile¶
The QAic compile option ‘-custom-IO-list-file’ can be used to provide
the custom I/O configuration file as follows:
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml
This feature is compatible with other QAic compile options such
as “-aic-preproc”. In case all transformations need to be pushed to
the device, the “-aic-preproc” option can be passed to qaic-compile
along with custom I/O.
For obtaining the custom I/O configuration template file, the option
‘-dump-custom-IO-config-template’ should be used with Qaic compile as
follows:
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yaml
If proper scale and offset fields need to be filled in the template
file, then the option ‘-dump-custom-IO-config-template’ should be
used with QAic compile along with the quantization profile generated with
model I/O, activations quantization schema, and calibration as follows:
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yaml,pgq_profile_model_IO.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration>
Custom I/O with -convert-to-fp16
Obtain the custom I/O configuration template
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yamlEdit the template file ‘custom_IO_config.yaml’ obtained in step 1 and provide the desired layout and precision fields for the inputs and outputs.
Apply custom I/O along with the
-convert-to-fp16option.$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -convert-to-fp16
Custom I/O with external quantization
Obtain the custom I/O configuration template.
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yamlEdit the template file ‘
custom_IO_config.yaml’ obtained in step 1 and provide the desired layout and precision fields for the inputs and outputs. The scale and offset fields will be ignored even if provided in the custom I/O configuration file. If precision is set to int8, scale and offset will be obtained from the external quantization profile file.Apply custom I/O along with external quantization feature.
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -external-quantization=external_quantization_profile.yaml
Custom I/O with profile guided quantization
In case of profile guided quantization, the same custom I/O configuration file should be used while dumping the profile and loading the profile.
If scale and offset of I/O are known to the user:
Obtain the custom I/O configuration template.
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yamlEdit the template file ‘
custom_IO_config.yaml‘ obtained in step 1 and provide the desired layout and precision fields for the inputs and outputs. Fill the scale and offset fields with proper values known to the user if the corresponding I/O precision is set to int8.Apply custom I/O and perform profiling using the input data in the format as per the custom I/O configuration.
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -dump-profile=pgq_profile_custom_IO.yamlApply custom I/O and load the quantization profile file generated in step 3.
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -custom-IO-list-file=custom_IO_config.yaml -load-profile=pgq_profile_custom_IO.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration> -quantization-schema-constants=<desired_schema>
If scale and offset of I/O are not known to the user:
In case the user does not have the scale and offset for the I/O that should be used in the custom I/O configuration file, the user can use any external tool such as AIMET for obtaining the scale and offset and follow the steps provided in the previous section.
Alternatively, the scale and offset can be derived by performing PGQ on model I/O. Then use that scale and offset to process the data into custom I/O datatype and format and perform profiling with the preprocessed input data. These steps are as follows:
Perform profiling with input data as per model I/O without using custom I/O configuration. This step can be skipped if profile file with model I/O is already available.
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_model_IO> -dump-profile=pgq_profile_model_IO.yamlUsing the profile file dumped in step 1, the scale and offset can be obtained in the template file using the option ‘
-dump-custom-IO-config-template’ and by providing the desired activations quantization schema and calibration method.$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -dump-custom-IO-config-template=custom_IO_config.yaml,pgq_profile_model_IO.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration>Edit the template file ‘
custom_IO_config.yaml’ obtained in step 2 and provide the desired layout and precision fields for the inputs and outputs. Note that the scale and offset fields are already filled with proper values based on the given quantization schema and calibration. They will be considered if the corresponding I/O precision is set to int8, else ignored.Perform profiling with the input data as per the custom I/O configuration file.
$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -dump-profile=pgq_profile_custom_IO.yaml -custom-IO-list-file=custom_IO_config.yamlLoad the model using the
-load-profileoption. Note that the activations quantization schema and calibration should be same as those used in step 2 while computing scale and offset.$ /opt/qti-aic/exec/qaic-compile -model=<path_to_model> -input-list-file=<input_files_list_custom_IO> -load-profile=pgq_profile_custom_IO.yaml -custom-IO-list-file=custom_IO_config.yaml -quantization-schema-activations=<desired_schema> -quantization-calibration=<desired_calibration> -quantization-schema-constants=<desired_schema>
Notes:
The Uint8 datatype is currently not supported with the custom I/O feature. Conversion from Uint8 to Int8 should be done by LRT or the device.
Int64, Int32, and Int16 are currently not supported with custom I/O feature.
When low precision scale (fewer decimal places) and offset are used to preprocess input data into custom I/O and further used by dequantization during quantization profile generation for custom I/O, there will be loss in accuracy.
LLM custom-io file¶
This file defines explicit model inputs and outputs for Key‑Value (KV) cache state management in an autoregressive LLM. It is used when:
The model is exported in KV‑style (prefill + decode separation),
Inference runs token‑by‑token,
KV cache must be retained across decode iterations instead of recomputed every time.
- IOName: past_key.0
Precision: mxint8
- IOName: past_key.0
Precision: mxint8
- IOName: past_key.0_RetainedState
Precision: mxint8
- IOName: past_key.0_RetainedState
Precision: mxint8