Chat API
Last updated
Last updated
Pico AI Homelab supports the following OpenAI and Ollama-compatible endpoints for chat:
v1/chat/
completions OpenAI-compatible chat API
api/chat
Ollama-compatible chat API
api/generate
Ollama-compatible completion API
These endpoints are conform OpenAI and Ollama. Pico supports message content types text
, image_url
, and video_url
.
Pico supports both LLM and VLM models. To discover which models have been downloaded and are available to clients, use the .
POST
/v1/chat
POST
/api/chat
POST
/api/generate
model
String
Name of the embeddings model, e.g. all-MiniLM-L6-v2
messages
Array of messages
String or strings to embed
stream
Optional boolean
If true or nil, the response will be streamed to the client per token
reasoning
Optional enum
See reasoning
chat_template_kwargs
Optional dictionary
See reasoning
max_tokens
Optional integer
Deprecated, use max_completion_tokens
instead.
max_completion_tokens
Optional integer
Upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens
temperature
Optional float
What sampling temperature to use
frequency_penalty
Optional float
top_p
Optional float
user
Optional string
Ignored by Pico
format
Optional string
Ignored by Pico
options
Optional object
See Ollama options
The reasoning API is supported in Pico 1.1.14 and later.
Reasoning for reasoning models such as Qwen 3 is enabled by default, and may be disabled by the chat client on a per-conversation basis. Pico supports both OpenAI and vLLM mechanisms to configure reasoning.
Unlike OpenAI, Pico does not implement reasoning levels (low
, medium
, high
); only binary ON or OFF states are supported. Requests specifying low
will be interpreted by Pico as Reasoning = Off. Note that Pico also supports non-OpenAI standard values on
, off
, and none
.
effort
Enum
See below
summary
Optional string
This property is ignored
low
Pico interprets this condition as: Reasoning mode disabled
medium
Pico interprets this condition as: Reasoning mode enabled
high
Pico interprets this condition as: Reasoning mode enabled
on
Reasoning mode enabled
off
Reasoning mode disabled
none
Reasoning mode disabled
Alternatively, use the vLLM API by setting key enable_thinking
to true
or false
in chat_template_kwargs