Validated Models

Text-only Language Models

Text Generation Task

QEff Auto Class: QEFFAutoModelForCausalLM

Architecture

Model Family

Representative Models

vLLM Support

MolmoForCausalLM

Molmo①

allenai/Molmo-7B-D-0924

Olmo2ForCausalLM

OLMo-2

allenai/OLMo-2-0425-1B

✔️

FalconForCausalLM

Falcon②

tiiuae/falcon-40b

✔️

Qwen3MoeForCausalLM

Qwen3Moe

Qwen/Qwen3-30B-A3B-Instruct-2507

✔️

GemmaForCausalLM

CodeGemma

google/codegemma-2b
google/codegemma-7b

✔️

Gemma③

google/gemma-2b
google/gemma-7b
google/gemma-2-2b
google/gemma-2-9b
google/gemma-2-27b

✔️

GptOssForCausalLM

GPT-OSS

openai/gpt-oss-20b

✔️

GPTBigCodeForCausalLM

Starcoder1.5

bigcode/starcoder

✔️

Starcoder2

bigcode/starcoder2-15b

✔️

GPTJForCausalLM

GPT-J

EleutherAI/gpt-j-6b

✔️

GPT2LMHeadModel

GPT-2

openai-community/gpt2

✔️

GraniteForCausalLM

Granite 3.1

ibm-granite/granite-3.1-8b-instruct
ibm-granite/granite-guardian-3.1-8b

✔️

Granite 20B

ibm-granite/granite-20b-code-base-8k
ibm-granite/granite-20b-code-instruct-8k

✔️

InternVLChatModel

Intern-VL①

OpenGVLab/InternVL2_5-1B
OpenGVLab/InternVL3_5-1B

✔️

LlamaForCausalLM

CodeLlama

codellama/CodeLlama-7b-hf
codellama/CodeLlama-13b-hf
codellama/CodeLlama-34b-hf

✔️

DeepSeek-R1-Distill-Llama

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

✔️

InceptionAI-Adapted

inceptionai/jais-adapted-7b
inceptionai/jais-adapted-13b-chat
inceptionai/jais-adapted-70b

✔️

Llama 3.3

meta-llama/Llama-3.3-70B-Instruct

✔️

Llama 3.2

meta-llama/Llama-3.2-1B
meta-llama/Llama-3.2-3B

✔️

Llama 3.1

meta-llama/Llama-3.1-8B
meta-llama/Llama-3.1-70B

✔️

Llama 3

meta-llama/Meta-Llama-3-8B
meta-llama/Meta-Llama-3-70B

✔️

Llama 2

meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-13b-chat-hf
meta-llama/Llama-2-70b-chat-hf

✔️

Vicuna

lmsys/vicuna-13b-delta-v0
lmsys/vicuna-13b-v1.3
lmsys/vicuna-13b-v1.5

✔️

MistralForCausalLM

Mistral

mistralai/Mistral-7B-Instruct-v0.1

✔️

MixtralForCausalLM

Codestral
Mixtral

mistralai/Codestral-22B-v0.1
mistralai/Mixtral-8x7B-v0.1

✔️

Phi3ForCausalLM

Phi-3②, Phi-3.5②

microsoft/Phi-3-mini-4k-instruct

✔️

QwenForCausalLM

DeepSeek-R1-Distill-Qwen

DeepSeek-R1-Distill-Qwen-32B

✔️

Qwen2, Qwen2.5

Qwen/Qwen2-1.5B-Instruct

✔️

LlamaSwiftKVForCausalLM

swiftkv

Snowflake/Llama-3.1-SwiftKV-8B-Instruct

✔️

Grok1ModelForCausalLM

grok-1②

hpcai-tech/grok-1


Embedding Models

Text Embedding Task

QEff Auto Class: QEFFAutoModel

Architecture

Model Family

Representative Models

vLLM Support

BertModel

BERT-based

BAAI/bge-base-en-v1.5
BAAI/bge-large-en-v1.5
BAAI/bge-small-en-v1.5
e5-large-v2

✔️

MPNetForMaskedLM

MPNet

sentence-transformers/multi-qa-mpnet-base-cos-v1

✔️

NomicBertModel

NomicBERT②

nomic-ai/nomic-embed-text-v1.5

RobertaModel

RoBERTa

ibm-granite/granite-embedding-30m-english
ibm-granite/granite-embedding-125m-english

✔️

XLMRobertaForSequenceClassification

XLM-RoBERTa

bge-reranker-v2-m3bge-reranker-v2-m3

✔️

XLMRobertaModel

XLM-RoBERTa

ibm-granite/granite-embedding-107m-multilingual
ibm-granite/granite-embedding-278m-multilingual
intfloat/multilingual-e5-large

✔️


Multimodal Language Models

Vision-Language Models (Text + Image Generation)

QEff Auto Class: QEFFAutoModelForImageTextToText

Architecture

Model Family

Representative Models

Qeff Single Qpc

Qeff Dual Qpc

vllm Single Qpc

vllm Dual Qpc

LlavaForConditionalGeneration

LLaVA-1.5

llava-hf/llava-1.5-7b-hf

✔️

✔️

✔️

✔️

MllamaForConditionalGeneration

Llama 3.2

meta-llama/Llama-3.2-11B-Vision Instruct
meta-llama/Llama-3.2-90B-Vision-Instruct

✔️

✔️

✔️

✔️

LlavaNextForConditionalGeneration

Granite Vision

ibm-granite/granite-vision-3.2-2b

✔️

✔️

Llama4ForConditionalGeneration

Llama-4-Scout

Llama-4-Scout-17B-16E-Instruct

✔️

✔️

✔️

✔️

Gemma3ForConditionalGeneration

Gemma3③

google/gemma-3-4b-it

✔️

✔️

Qwen2_5_VLForConditionalGeneration

Qwen2.5-VL

Qwen/Qwen2.5-VL-3B-Instruct

✔️

✔️

✔️

Mistral3ForConditionalGeneration

Mistral3

mistralai/Mistral-Small-3.1-24B-Instruct-2503

✔️

Dual QPC: In the Dual QPC(Qualcomm Program Container) setup, the model is split across two configurations:

  • The Vision Encoder runs in one QPC.

  • The Language Model (responsible for output generation) runs in a separate QPC.

  • The outputs from the Vision Encoder are transferred to the Language Model.

  • The dual QPC approach introduces the flexibility to run the vision and language components independently.

Single QPC: In the single QPC(Qualcomm Program Container) setup, the entire model—including both image encoding and text generation—runs within a single QPC. There is no model splitting, and all components operate within the same execution environment.

Note

The choice between Single and Dual QPC is determined during model instantiation using the kv_offload setting. If the kv_offload is set to True it runs in dual QPC and if its set to False model runs in single QPC mode.

Audio Models

(Automatic Speech Recognition) - Transcription Task

QEff Auto Class: QEFFAutoModelForSpeechSeq2Seq

Architecture

Model Family

Representative Models

vLLM Support

Whisper

Whisper

openai/whisper-tiny
openai/whisper-base
openai/whisper-small
openai/whisper-medium
openai/whisper-large
openai/whisper-large-v3-turbo

✔️

Wav2Vec2

Wav2Vec2

facebook/wav2vec2-base
facebook/wav2vec2-large


Diffusion Models

Image Generation Models

QEff Auto Class: QEffFluxPipeline

Architecture

Model Family

Representative Models

vLLM Support

FluxPipeline

FLUX.1

black-forest-labs/FLUX.1-schnell

Video Generation Models

QEff Auto Class: QEffWanPipeline

Architecture

Model Family

Representative Models

vLLM Support

WanPipeline

Wan2.2

Wan-AI/Wan2.2-T2V-A14B-Diffusers


Note

① Intern-VL and Molmo models are Vision-Language Models but use QEFFAutoModelForCausalLM for inference to stay compatible with HuggingFace Transformers.

② Set trust_remote_code=True for end-to-end inference with vLLM.

③ Pass disable_sliding_window for few family models when using vLLM.


Models Coming Soon

Architecture

Model Family

Representative Models

NemotronHForCausalLM

NVIDIA Nemotron v3

NVIDIA Nemotron v3

Sam3Model

facebook/sam3

facebook/sam3

StableDiffusionModel

HiDream-ai

HiDream-ai/HiDream-I1-Full

MistralLarge3Model

Mistral Large 3

mistralai/mistral-large-3