Validated Models

Text-only Language Models

Text Generation Task

QEff Auto Class: QEFFAutoModelForCausalLM

Architecture

Model Family

Representative Models

CB Support

FalconForCausalLM

Falcon

tiiuae/falcon-40b

✔️

Qwen3MoeForCausalLM

Qwen3Moe

Qwen/Qwen3-30B-A3B-Instruct-2507

✔️

GemmaForCausalLM

CodeGemma

google/codegemma-2b
google/codegemma-7b

✔️

Gemma

google/gemma-2b
google/gemma-7b
google/gemma-2-2b
google/gemma-2-9b
google/gemma-2-27b

✔️

GPTBigCodeForCausalLM

Starcoder1.5

bigcode/starcoder

✔️

Starcoder2

bigcode/starcoder2-15b

✔️

GPTJForCausalLM

GPT-J

EleutherAI/gpt-j-6b

✔️

GPT2LMHeadModel

GPT-2

openai-community/gpt2

✔️

GraniteForCausalLM

Granite 3.1

ibm-granite/granite-3.1-8b-instruct
ibm-granite/granite-guardian-3.1-8b

✔️

Granite 20B

ibm-granite/granite-20b-code-base-8k
ibm-granite/granite-20b-code-instruct-8k

✔️

InternVLChatModel

Intern-VL

OpenGVLab/InternVL2_5-1B

LlamaForCausalLM

CodeLlama

codellama/CodeLlama-7b-hf
codellama/CodeLlama-13b-hf
codellama/CodeLlama-34b-hf

✔️

DeepSeek-R1-Distill-Llama

deepseek-ai/DeepSeek-R1-Distill-Llama-70B

✔️

InceptionAI-Adapted

inceptionai/jais-adapted-7b
inceptionai/jais-adapted-13b-chat
inceptionai/jais-adapted-70b

✔️

Llama 3.3

meta-llama/Llama-3.3-70B-Instruct

✔️

Llama 3.2

meta-llama/Llama-3.2-1B
meta-llama/Llama-3.2-3B

✔️

Llama 3.1

meta-llama/Llama-3.1-8B
meta-llama/Llama-3.1-70B

✔️

Llama 3

meta-llama/Meta-Llama-3-8B
meta-llama/Meta-Llama-3-70B

✔️

Llama 2

meta-llama/Llama-2-7b-chat-hf
meta-llama/Llama-2-13b-chat-hf
meta-llama/Llama-2-70b-chat-hf

✔️

Vicuna

lmsys/vicuna-13b-delta-v0
lmsys/vicuna-13b-v1.3
lmsys/vicuna-13b-v1.5

✔️

MistralForCausalLM

Mistral

mistralai/Mistral-7B-Instruct-v0.1

✔️

MixtralForCausalLM

Codestral
Mixtral

mistralai/Codestral-22B-v0.1
mistralai/Mixtral-8x7B-v0.1

✔️

MPTForCausalLM

MPT

mosaicml/mpt-7b

✔️

Phi3ForCausalLM

Phi-3, Phi-3.5

microsoft/Phi-3-mini-4k-instruct

✔️

QwenForCausalLM

DeepSeek-R1-Distill-Qwen

DeepSeek-R1-Distill-Qwen-32B

✔️

Qwen2, Qwen2.5

Qwen/Qwen2-1.5B-Instruct

✔️

LlamaSwiftKVForCausalLM

swiftkv

Snowflake/Llama-3.1-SwiftKV-8B-Instruct

✔️

Grok1ModelForCausalLM

grok-1

hpcai-tech/grok-1

✔️


Embedding Models

Text Embedding Task

QEff Auto Class: QEFFAutoModel

Architecture

Model Family

Representative Models

BertModel

BERT-based

BAAI/bge-base-en-v1.5
BAAI/bge-large-en-v1.5
BAAI/bge-small-en-v1.5
e5-large-v2

LlamaModel

Llama-based

intfloat/e5-mistral-7b-instruct

MPNetForMaskedLM

MPNet

sentence-transformers/multi-qa-mpnet-base-cos-v1

MistralModel

Mistral

e5-mistral-7b-instruct

NomicBertModel

NomicBERT

nomic-embed-text-v1.5

Qwen2ForCausalLM

Qwen2

stella_en_1.5B_v5

RobertaModel

RoBERTa

ibm-granite/granite-embedding-30m-english
ibm-granite/granite-embedding-125m-english

XLMRobertaForSequenceClassification

XLM-RoBERTa

bge-reranker-v2-m3bge-reranker-v2-m3

XLMRobertaModel

XLM-RoBERTa

ibm-granite/granite-embedding-107m-multilingual
ibm-granite/granite-embedding-278m-multilingual


Multimodal Language Models

Vision-Language Models (Text + Image Generation)

QEff Auto Class: QEFFAutoModelForImageTextToText

Architecture

Model Family

Representative Models

CB Support

Single Qpc Support

Dual Qpc Support

LlavaForConditionalGeneration

LLaVA-1.5

llava-hf/llava-1.5-7b-hf

✔️

✔️

MllamaForConditionalGeneration

Llama 3.2

meta-llama/Llama-3.2-11B-Vision Instruct
meta-llama/Llama-3.2-90B-Vision

✔️

✔️

LlavaNextForConditionalGeneration

Granite Vision

ibm-granite/granite-vision-3.2-2b

✔️

Llama4ForConditionalGeneration

Llama-4-Scout

Llama-4-Scout-17B-16E-Instruct

✔️

✔️

Gemma3ForConditionalGeneration

Gemma3

google/gemma-3-4b-it

✔️

✔️

Dual QPC: In the Dual QPC(Qualcomm Program Container) setup, the model is split across two configurations:

  • The Vision Encoder runs in one QPC.

  • The Language Model (responsible for output generation) runs in a separate QPC.

  • The outputs from the Vision Encoder are transferred to the Language Model.

  • The dual QPC approach introduces the flexibility to run the vision and language components independently.

Single QPC: In the single QPC(Qualcomm Program Container) setup, the entire model—including both image encoding and text generation—runs within a single QPC. There is no model splitting, and all components operate within the same execution environment.

For more details click here

Note

The choice between Single and Dual QPC is determined during model instantiation using the kv_offload setting. If the kv_offload is set to True it runs in dual QPC and if its set to False model runs in single QPC mode.


Audio Models

(Automatic Speech Recognition) - Transcription Task

QEff Auto Class: QEFFAutoModelForSpeechSeq2Seq

Architecture

Model Family

Representative Models

Whisper

Whisper

openai/whisper-tiny
openai/whisper-base
openai/whisper-small
openai/whisper-medium
openai/whisper-large
openai/whisper-large-v3-turbo


Models Coming Soon

Architecture

Model Family

Representative Models

Qwen3MoeForCausalLM

Qwen3

Qwen/Qwen3-MoE-15B-A2B

Mistral3ForConditionalGeneration

Mistral 3.1

mistralai/Mistral-Small-3.1-24B-Base-2503

BaichuanForCausalLM

Baichuan2

baichuan-inc/Baichuan2-7B-Base

CohereForCausalLM

Command-R

CohereForAI/c4ai-command-r-v01

DbrxForCausalLM

DBRX

databricks/dbrx-base