Finetune Infra

This repository provides the infrastructure for finetuning models using different hardware accelerators such as QAIC. Same CLI can be used to run Finetuning on gpu by setting the device flag.(for finetuning on gpu, install torch specific to cuda)

Installation

Same as QEfficient along with QAIC PyTorch Eager mode.

For QEfficient Library : https://github.com/quic/efficient-transformers

For torch_qaic, assuming QEfficient is already installed,

pip install /opt/qti-aic/integrations/torch_qaic/py310/torch_qaic-0.1.0-cp310-cp310-linux_x86_64.whl

Finetuning

Export the ENV variables to download and enable private datasets

export HF_DATASETS_TRUST_REMOTE_CODE=True

Export the ENV variables to get the device and HW traces and debugging logs

export QAIC_DEVICE_LOG_LEVEL=0 # For Device level logs
export QAIC_DEBUG=1 # To understand the CPU fallback ops

Dataset Details

To download the Alpaca dataset, visit this link. Download the dataset and place it under the dataset directory. Make sure to update the training configuration accordingly.

wget -c https://raw.githubusercontent.com/tatsu-lab/stanford_alpaca/refs/heads/main/alpaca_data.json -P dataset/

To download the grammar dataset, visit this link. Download the dataset and place it under the datasets_grammar directory. Make sure to update the training configuration accordingly.

Usage

Single SOC finetuning on QAIC

python -m QEfficient.cloud.finetune --device qaic:0 --model_name "meta-llama/Llama-3.2-1B"

Also, you can configure various training parameters, for more details, checkout: QEfficient/finetune/configs/training.py, Below is example command line

python -m QEfficient.cloud.finetune --device qaic:0 --use-peft --output_dir ./meta-sam --num_epochs 2 --context_length 256 

Distributed training(DDP) on QAIC

QAIC_VISIBLE_DEVICES=0,1,2,3 torchrun --nproc-per-node 4 -m QEfficient.cloud.finetune --device qaic --enable_ddp --dist_backend qccl --num_epochs 2  --model_name "meta-llama/Llama-3.2-1B"

**nproc-per-node is number of workers(QAIC devices) running locally.

Visualization

Tensorboard logs are generated inside runs/ directory with date and time stamp. to visualise the data,

tensorboard --logdir runs/<file> --bind_all