Python Backend for Embeddings¶
Triton qaic Python backend for embedding models supports execution of qpc for BERT style models. For list of supported models refer to - cloud-ai-sdk/models/language_processing/encoder/README.md at quic/cloud-ai-sdk · GitHub
We use compiled binary generated by qaic-compile for serving embedding models through Triton Python backend.
Sample client scripts are included for models that produce sentence embeddings as outputs.
Generate Embedding Model Repository¶
generate_embedding_model_repo.py script will be available at
location -
/opt/qti-aic/integrations/triton/release-artifacts/embedding-models/
This script uses a template to auto-generate config for custom
models. Configure required parameters such as model_name,
aic_binary_dir, hf_model_name through command line options to
generate_embedding_model_repo.py script.
A model folder, identified by the provided model_name, is created in the required format under embedding_model_dir at
/opt/qti-aic/integrations/triton/release-artifacts/embedding-models/. This folder includes a config.pbtxt file.
python generate_embedding_model_repo.py -h
usage: generate_embedding_model_repo.py [-h] --model_name MODEL_NAME --aic_binary_dir AIC_BINARY_DIR
[--python_backend_dir PYTHON_BACKEND_DIR] --hf_model_name
HF_MODEL_NAME [--max_prompt_length MAX_PROMPT_LENGTH]
[--max_batch_size MAX_BATCH_SIZE] [--num_instances NUM_INSTANCES]
options:
-h, --help show this help message and exit
--model_name MODEL_NAME
Name of the model to generate model repo(bert-base-cased)
--aic_binary_dir AIC_BINARY_DIR
Path to QPC(programqpc.bin) directory
--python_backend_dir PYTHON_BACKEND_DIR
Path to QAic Python Backend Directory for Embedding models
--hf_model_name HF_MODEL_NAME
Name of the model as identified on Hugging Face (google-bert/bert-base-cased)
--max_prompt_length MAX_PROMPT_LENGTH
Set maximum prompt length that tokenizer should support
--max_batch_size MAX_BATCH_SIZE
Set maximum number of samples that should be allowed to process at same time.
Configure as a value less than or equal to batch size of compiled binary
--num_instances NUM_INSTANCES
Set instance count. Each instance uses 1 activation on Cloud AI device. Max
supported instance count is limited by NSP available.
Optional: Copy the model folder to ``/path/to/workspace`` (mapped to a host path), to reuse the generated model repository for future runs.
Launch Triton and Load Embeddings¶
Prerequisite: You may need to obtain access to the required models from Hugging Face and log in with a Hugging Face token using huggingface-cli
login before launching the server.
Launch Triton server with embedding_model_dir inside the Triton container:
/opt/tritonserver/bin/tritonserver --model-repository=<path/to/embedding_model_dir>
Start Client Container¶
docker run -it --rm -v /path/to/unzipped/apps-sdk/common/integrations/triton/release-artifacts/embedding-models/tests:/embedding-models/tests --net=host nvcr.io/nvidia/tritonserver:25.02-py3-sdk bash
Run client examples
Once the server has started you can use the example Triton client tests (http_client_example.py, grpc_client_example.py, http_api_example.py) provided to inference with models loaded.
# httpclient example with bert-base-cased model loaded on server
python /embedding-models/tests/http_client_example.py --prompt "Earthquakes in this region are uncommon but not unexpected. It's likely people near the epicenter are going to feel aftershocks for this earthquake in the magnitude 2-3 range, and there's a small chance there can be an earthquake as large or larger, following an earthquake like this, Paul Earle, a seismologist at the USGS Earthquake Hazards Program told reporters. In terms of our operations, this is a routine earthquake. Immediately we knew this would be of high interest and important to people who don't feel earthquakes a lot." --model_name bert-base-cased
# qpc compiled for batch_size>=2
python /embedding-models/tests/http_client_example.py --prompt "Earthquakes in this region are uncommon but not unexpected. It's likely people near the epicenter are going to feel aftershocks for this earthquake in the magnitude 2-3 range, and there's a small chance there can be an earthquake as large or larger, following an earthquake like this, Paul Earle, a seismologist at the USGS Earthquake Hazards Program told reporters. In terms of our operations, this is a routine earthquake. Immediately we knew this would be of high interest and important to people who don't feel earthquakes a lot.|Earthquakes in this region are uncommon but not unexpected. It's likely people near the epicenter are going to feel aftershocks for this earthquake in the magnitude 2-3 range, and there's a small chance there can be an earthquake as large or larger, following an earthquake like this, Paul Earle, a seismologist at the USGS Earthquake Hazards Program told reporters. In terms of our operations, this is a routine earthquake. Immediately we knew this would be of high interest and important to people who don't feel earthquakes a lot." --model_name bert-base-cased
# http api example with bert-base-cased model loaded on server
python /embedding-models/tests/http_api_example.py --prompt 'Earthquakes in this region are uncommon but not unexpected. It's likely people near the epicenter are going to feel aftershocks for this earthquake in the magnitude 2-3 range, and there's a small chance there can be an earthquake as large or larger, following an earthquake like this, Paul Earle, a seismologist at the USGS Earthquake Hazards Program told reporters. In terms of our operations, this is a routine earthquake. Immediately we knew this would be of high interest and important to people who don't feel earthquakes a lot.' -m bert-base-cased -u http://localhost:8000/v2/models/bert-base-cased/infer
Benchmarking¶
Use GitHub - triton-inference-server/perf_analyzer for benchmarking
# perf analyzer example with bert-base-cased model
# binary compiled for batch size=8, cores=2
# num_instances/instance count set to 7 in config.pbtxt.
perf_analyzer -m bert-base-cased --string-data 'Earthquakes in this region are uncommon but not unexpected. It's likely people near the epicenter are going to feel aftershocks for this earthquake in the magnitude 2-3 range, and there's a small chance there can be an earthquake as large or larger, following an earthquake like this, Paul Earle, a seismologist at the USGS Earthquake Hazards Program told reporters. In terms of our operations, this is a routine earthquake. Immediately we knew this would be of high interest and important to people who don't feel earthquakes a lot.' -b 8 --shape prompt:1 -p 10000 --concurrency 7:35:7