vLLM supported models

Text-only Language Models

Architecture

Model Family

Representative Models

vLLM Support

MolmoForCausalLM

Molmo

allenai/Molmo-7B-D-0924

Olmo2ForCausalLM

OLMo-2

allenai/OLMo-2-0425-1B

✔️

FalconForCausalLM

Falcon

tiiuae/falcon-40b

✔️

Qwen3MoeForCausalLM

Qwen3Moe

Qwen/Qwen3-30B-A3B-Instruct-2507

✔️

GemmaForCausalLM

CodeGemma / Gemma

google/codegemma-2b, google/gemma-7b

✔️

GptOssForCausalLM

GPT-OSS

openai/gpt-oss-20b

✔️

GPTBigCodeForCausalLM

StarCoder

bigcode/starcoder

✔️

GPTJForCausalLM

GPT-J

EleutherAI/gpt-j-6b

✔️

GraniteForCausalLM

Granite

ibm-granite/granite-3.1-8b-instruct

✔️

LlamaForCausalLM

LLaMA

meta-llama/Llama-3.1-8B

✔️

MistralForCausalLM

Mistral

mistralai/Mistral-7B-Instruct-v0.1

✔️

Embedding Models

Architecture

Model Family

Representative Models

vLLM Support

BertModel

BERT-based

BAAI/bge-base-en-v1.5

✔️

MPNetForMaskedLM

MPNet

sentence-transformers/multi-qa-mpnet-base-cos-v1

✔️

NomicBertModel

NomicBERT

nomic-ai/nomic-embed-text-v1.5

RobertaModel

RoBERTa

ibm-granite/granite-embedding-125m-english

✔️

XLMRobertaModel

XLM-RoBERTa

ibm-granite/granite-embedding-278m-multilingual

✔️

Vision-Language Models

Architecture

Model Family

Representative Models

vLLM Single QPC

vLLM Dual QPC

LlavaForConditionalGeneration

LLaVA-1.5

llava-hf/llava-1.5-7b-hf

✔️

✔️

MllamaForConditionalGeneration

LLaMA-3.2-Vision

meta-llama/Llama-3.2-11B-Vision-Instruct

✔️

✔️

LlavaNextForConditionalGeneration

Granite Vision

ibm-granite/granite-vision-3.2-2b

✔️

Qwen2_5_VLForConditionalGeneration

Qwen2.5-VL

Qwen/Qwen2.5-VL-3B-Instruct

✔️

Gemma3ForConditionalGeneration

Gemma-3

google/gemma-3-4b-it

Notes

  • A check mark (✔️) indicates validated vLLM support.

  • A cross (✕) indicates that vLLM support is not validated.

  • Single QPC and Dual QPC indicate different Qualcomm Program Container (QPC) deployment modes.

Audio Models (Speech-to-Text)

Architecture

Model Family

Representative Models

vLLM Support

WhisperForConditionalGeneration

Whisper

openai/whisper-tiny

✔️

WhisperForConditionalGeneration

Whisper

openai/whisper-base

✔️

WhisperForConditionalGeneration

Whisper

openai/whisper-small

✔️

WhisperForConditionalGeneration

Whisper

openai/whisper-medium

✔️

WhisperForConditionalGeneration

Whisper

openai/whisper-large-v2

✔️

WhisperForConditionalGeneration

Whisper

openai/whisper-large-v3-turbo

✔️

Notes

  • A check mark (✔️) indicates validated vLLM support.

  • A cross (✕) indicates that vLLM support is not validated.

  • Audio models are supported under the Speech-to-Text task using the Whisper architecture.

For more details, see: Validated Models and vLLM Support — Efficient Transformers Documentation