POST /v1/chat/completions

The /v1/chat/completions endpoint accepts a conversation as an array of messages and returns a model-generated response. It is fully compatible with the OpenAI Chat Completions API, so any client or library that targets OpenAI can be pointed at TokModel by changing only the base URL and API key. You can stream responses token-by-token using server-sent events (SSE) by setting stream to true.

Request parameters

model

string

required

The ID of the model to use for this request. Use the list models endpoint to retrieve available model IDs.

messages

array

required

An array of message objects that make up the conversation. Each object must include a role ("system", "user", or "assistant") and a content string.

stream

boolean

default:"false"

When true, the response is streamed as server-sent events. Each event contains a partial ChatCompletionChunk object. The stream terminates with data: [DONE].

temperature

number

default:"1"

Sampling temperature between 0 and 2. Lower values produce more deterministic output; higher values produce more varied output. Avoid setting both temperature and top_p at the same time.

max_tokens

integer

The maximum number of tokens to generate in the response. The model will stop once this limit is reached, even if the response is incomplete.

top_p

number

Nucleus sampling parameter. The model considers only the tokens comprising the top top_p probability mass. A value of 0.1 means only the top 10% of probability mass is considered.

integer

default:"1"

How many independent completion choices to generate for each request. Each choice is billed separately.

stop

string | string[]

One or more sequences at which the model will stop generating further tokens. The stop sequence itself is not included in the response.

Response fields

string

A unique identifier for this completion, prefixed with chatcmpl-.

object

string

Always "chat.completion".

created

integer

Unix timestamp (seconds) of when the completion was created.

model

string

The model that was used to generate the response.

choices

array

An array of completion choices. Contains n items when n is set.

Show properties

index

integer

Zero-based index of this choice in the choices array.

message

object

The generated message object.

Show properties

role

string

Always "assistant" for generated messages.

content

string

The text content of the generated message.

finish_reason

string

Why the model stopped generating. One of "stop" (natural end or stop sequence reached), "length" (max tokens reached), or "content_filter".

usage

object

Token usage for the request.

Show properties

prompt_tokens

integer

Number of tokens in the input messages.

completion_tokens

integer

Number of tokens in the generated response.

total_tokens

integer

Total tokens consumed (prompt_tokens + completion_tokens).

Example

Request

curl https://tokmodel.com/v1/chat/completions \
  --request POST \
  --header "Authorization: Bearer YOUR_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "openai/gpt-4o",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is the capital of France?" }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716470400,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 26,
    "completion_tokens": 9,
    "total_tokens": 35
  }
}

Endpoints

Documentation Index

​Request parameters

​Response fields

​Example

​Request

​Response

Request parameters

Response fields

Example

Request

Response