vLLM Supports Tool Call Parsing¶
vLLM supports structured function calling through configurable tool call parsers. In the context of GPT-OSS models, this refers specifically to parsing
model-generated function calls into OpenAI-compatible Chat Completions format using the --tool-call-parser openai option.
For more details, see: vLLM Tool Calling Documentation
GPT-OSS Model Context¶
GPT-OSS models (gpt-oss-120B and gpt-oss-20B) are trained to emit structured, JSON-based outputs for safe and predictable invocation of external capabilities. When using function calling with vLLM, GPT-OSS models require OpenAI-style tool parsing and automatic tool selection to be explicitly enabled.
Required vLLM Flags for GPT-OSS Function Calling¶
When using GPT-OSS models with function calling through the Chat Completions API, the following vLLM flags are required:
--tool-call-parser openaiRequired to parse GPT-OSS model outputs into OpenAI-compatible function calls.--enable-auto-tool-choiceRequired to allow the model to automatically decide when to emit a function call during its reasoning process.
Frameworks such as LangChain default to automatic tool selection and therefore require both flags to be enabled.
Scope of Tool Parsing Support¶
Tool parsing support validation is limited to basic functional behavior when vLLM is integrated with LangChain using OpenAI-compatible APIs.
Stateful agent features such as response chaining, tool state persistence, file-based tools, or vector-store-backed workflows are outside the scope of this validation.
Limitations and Constraints¶
Tool parsing is supported only via OpenAI-compatible Chat/Completions APIs (for example,
/v1/chat/completions).Use of OpenAI Responses APIs (
/v1/responses) is not supported for tool parsing.Only Python 3.12 is supported for GPT-OSS models when using function calling.
In non-disaggregated mode, only
seq_len = 1with continuous batching disabled has been tested; non-disaggregated mode is not recommended.GPT-OSS-120B compilation requires very high system memory (1–2 TB RAM with weight offloading).
Tool parsing options (such as OpenAI-style tool call parsing) affect only API-level request and response handling and do not alter model execution, quantization, or hardware acceleration behavior.
Supported Tool Parsing Paths¶
vLLM Version |
Python |
Target Models |
Tool Parsing Path |
Notes |
|---|---|---|---|---|
vLLM 0.10.1 |
Python 3.12 |
GPT-OSS-120B / GPT-OSS-20B only |
|
Required for GPT-OSS models |