Chat API

Pico AI Homelab supports the following OpenAI and Ollama-compatible endpoints for chat:

  • v1/chat/completions OpenAI-compatible chat API

  • api/chat Ollama-compatible chat API

  • api/generate Ollama-compatible completion API

These endpoints are conform OpenAI and Ollama. Pico supports message content types text, image_url, and video_url.

Pico supports both LLM and VLM models. To discover which models have been downloaded and are available to clients, use the models API.

Request

POST /v1/chat

POST /api/chat

POST /api/generate

Name
Type
Description

model

String

Name of the embeddings model, e.g. all-MiniLM-L6-v2

messages

Array of messages

String or strings to embed

stream

Optional boolean

If true or nil, the response will be streamed to the client per token

reasoning

Optional enum

See reasoning

chat_template_kwargs

Optional dictionary

See reasoning

max_tokens

Optional integer

Deprecated, use max_completion_tokens instead.

max_completion_tokens

Optional integer

Upper bound for the number of tokens that can be generated for a completion, including visible output tokens and reasoning tokens

temperature

Optional float

What sampling temperature to use

frequency_penalty

Optional float

top_p

Optional float

user

Optional string

Ignored by Pico

format

Optional string

Ignored by Pico

options

Optional object

See Ollama options

think

Optional boolean

Enables or disables reasoning (conform Ollama API). Available from 1.1.18

Reasoning

Pico supports the OpenAI, vLLM, and Ollama way of enabling and disabling reasoning. The reasoning (OpenAI API-compatible) and enable_thinking options (vLLM API-compatible) are supported in Pico 1.1.14 and later, the think option (Ollama API-compatible) from 1.1.18.

Note that setting the reasoning options is optional. If no reasoning option is supplied, the model will use its default settings, which is often reasnoning enabled.

OpenAI-compatible Reasoning

Unlike OpenAI, Pico does not implement reasoning levels (low, medium, high); only binary ON or OFF states are supported. Requests specifying low will be interpreted by Pico as Reasoning = Off. Note that Pico also supports non-OpenAI standard values on, off, and none .

Name
Type
Description

effort

Enum

See below

summary

Optional string

This property is ignored

Name
Description

low

Pico interprets this condition as: Reasoning mode disabled

medium

Pico interprets this condition as: Reasoning mode enabled

high

Pico interprets this condition as: Reasoning mode enabled

on

Reasoning mode enabled

off

Reasoning mode disabled

none

Reasoning mode disabled

vLLM-compatible Reasoning

Alternatively, use the vLLM API by setting key enable_thinking to true or false in chat_template_kwargs

Ollama-compatible Reasoning

think is an optional boolean.

As of Pico AI Server 1.1.18, the thinking field in the response is not supported yet, meaning that thinking responses will be streamed as a regular response.

Last updated