Validated Models
Text-only Language Models
Text Generation Task
QEff Auto Class: QEFFAutoModelForCausalLM
Architecture |
Model Family |
Representative Models |
CB Support |
---|---|---|---|
FalconForCausalLM |
Falcon |
✔️ |
|
Qwen3MoeForCausalLM |
Qwen3Moe |
✔️ |
|
GemmaForCausalLM |
CodeGemma |
✔️ |
|
Gemma |
google/gemma-2b |
✔️ |
|
GPTBigCodeForCausalLM |
Starcoder1.5 |
✔️ |
|
Starcoder2 |
✔️ |
||
GPTJForCausalLM |
GPT-J |
✔️ |
|
GPT2LMHeadModel |
GPT-2 |
✔️ |
|
GraniteForCausalLM |
Granite 3.1 |
ibm-granite/granite-3.1-8b-instruct |
✔️ |
Granite 20B |
ibm-granite/granite-20b-code-base-8k |
✔️ |
|
InternVLChatModel |
Intern-VL |
||
LlamaForCausalLM |
CodeLlama |
codellama/CodeLlama-7b-hf |
✔️ |
DeepSeek-R1-Distill-Llama |
✔️ |
||
InceptionAI-Adapted |
inceptionai/jais-adapted-7b |
✔️ |
|
Llama 3.3 |
✔️ |
||
Llama 3.2 |
✔️ |
||
Llama 3.1 |
✔️ |
||
Llama 3 |
✔️ |
||
Llama 2 |
meta-llama/Llama-2-7b-chat-hf |
✔️ |
|
Vicuna |
lmsys/vicuna-13b-delta-v0 |
✔️ |
|
MistralForCausalLM |
Mistral |
✔️ |
|
MixtralForCausalLM |
Codestral |
✔️ |
|
MPTForCausalLM |
MPT |
✔️ |
|
Phi3ForCausalLM |
Phi-3, Phi-3.5 |
✔️ |
|
QwenForCausalLM |
DeepSeek-R1-Distill-Qwen |
✔️ |
|
Qwen2, Qwen2.5 |
✔️ |
||
LlamaSwiftKVForCausalLM |
swiftkv |
✔️ |
|
Grok1ModelForCausalLM |
grok-1 |
✔️ |
Embedding Models
Text Embedding Task
QEff Auto Class: QEFFAutoModel
Architecture |
Model Family |
Representative Models |
---|---|---|
BertModel |
BERT-based |
BAAI/bge-base-en-v1.5 |
LlamaModel |
Llama-based |
|
MPNetForMaskedLM |
MPNet |
|
MistralModel |
Mistral |
|
NomicBertModel |
NomicBERT |
|
Qwen2ForCausalLM |
Qwen2 |
|
RobertaModel |
RoBERTa |
ibm-granite/granite-embedding-30m-english |
XLMRobertaForSequenceClassification |
XLM-RoBERTa |
|
XLMRobertaModel |
XLM-RoBERTa |
ibm-granite/granite-embedding-107m-multilingual |
Multimodal Language Models
Vision-Language Models (Text + Image Generation)
QEff Auto Class: QEFFAutoModelForImageTextToText
Architecture |
Model Family |
Representative Models |
CB Support |
Single Qpc Support |
Dual Qpc Support |
---|---|---|---|---|---|
LlavaForConditionalGeneration |
LLaVA-1.5 |
✕ |
✔️ |
✔️ |
|
MllamaForConditionalGeneration |
Llama 3.2 |
meta-llama/Llama-3.2-11B-Vision Instruct |
✕ |
✔️ |
✔️ |
LlavaNextForConditionalGeneration |
Granite Vision |
✕ |
✕ |
✔️ |
|
Llama4ForConditionalGeneration |
Llama-4-Scout |
✕ |
✔️ |
✔️ |
|
Gemma3ForConditionalGeneration |
Gemma3 |
✕ |
✔️ |
✔️ |
Dual QPC: In the Dual QPC(Qualcomm Program Container) setup, the model is split across two configurations:
The Vision Encoder runs in one QPC.
The Language Model (responsible for output generation) runs in a separate QPC.
The outputs from the Vision Encoder are transferred to the Language Model.
The dual QPC approach introduces the flexibility to run the vision and language components independently.
Single QPC: In the single QPC(Qualcomm Program Container) setup, the entire model—including both image encoding and text generation—runs within a single QPC. There is no model splitting, and all components operate within the same execution environment.
For more details click here
Note
The choice between Single and Dual QPC is determined during model instantiation using the kv_offload
setting.
If the kv_offload
is set to True
it runs in dual QPC and if its set to False
model runs in single QPC mode.
Audio Models
(Automatic Speech Recognition) - Transcription Task
QEff Auto Class: QEFFAutoModelForSpeechSeq2Seq
Architecture |
Model Family |
Representative Models |
---|---|---|
Whisper |
Whisper |
openai/whisper-tiny |
Models Coming Soon
Architecture |
Model Family |
Representative Models |
---|---|---|
Qwen3MoeForCausalLM |
Qwen3 |
|
Mistral3ForConditionalGeneration |
Mistral 3.1 |
|
BaichuanForCausalLM |
Baichuan2 |
|
CohereForCausalLM |
Command-R |
|
DbrxForCausalLM |
DBRX |