Chat API

Pico AI Homelab supports the following OpenAI and Ollama-compatible endpoints for chat:

v1/chat/completions OpenAI-compatible chat API
api/chat Ollama-compatible chat API
api/generate Ollama-compatible completion API

These endpoints are conform OpenAI and Ollama. Pico supports message content types text, image_url, and video_url.

Pico supports both LLM and VLM models. To discover which models have been downloaded and are available to clients, use the models API.

Request

POST /v1/chat

POST /api/chat

POST /api/generate

Name

Type

Description

model

String

Name of the embeddings model, e.g. all-MiniLM-L6-v2

messages

Array of messages

String or strings to embed

stream

Optional boolean

If true or nil, the response will be streamed to the client per token

reasoning

Optional enum

See reasoning

chat_template_kwargs

Optional dictionary

See reasoning

max_tokens

Optional integer

Deprecated, use max_completion_tokens instead.

max_completion_tokens

Optional integer

Upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens

temperature

Optional float

What sampling temperature to use

frequency_penalty

Optional float

top_p

Optional float

user

Optional string

Ignored by Pico

format

Optional string

Ignored by Pico

options

Optional object

See Ollama options

Reasoning

The reasoning API is supported in Pico 1.1.14 and later.

Reasoning for reasoning models such as Qwen 3 is enabled by default, and may be disabled by the chat client on a per-conversation basis. Pico supports both OpenAI and vLLM mechanisms to configure reasoning.

Unlike OpenAI, Pico does not implement reasoning levels (low, medium, high); only binary ON or OFF states are supported. Requests specifying low will be interpreted by Pico as Reasoning = Off. Note that Pico also supports non-OpenAI standard values on, off, and none .

Name

Type

Description

effort

Enum

See below

summary

Optional string

This property is ignored

Name

Description

low

Pico interprets this condition as: Reasoning mode disabled

medium

Pico interprets this condition as: Reasoning mode enabled

high

Pico interprets this condition as: Reasoning mode enabled

on

Reasoning mode enabled

off

Reasoning mode disabled

none

Reasoning mode disabled

Alternatively, use the vLLM API by setting key enable_thinking to true or false in chat_template_kwargs

PreviousModels API NextEmbeddings API

Last updated 1 month ago