Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tokmodel.com/llms.txt

Use this file to discover all available pages before exploring further.

The /v1/chat/completions endpoint accepts a conversation as an array of messages and returns a model-generated response. It is fully compatible with the OpenAI Chat Completions API, so any client or library that targets OpenAI can be pointed at TokModel by changing only the base URL and API key. You can stream responses token-by-token using server-sent events (SSE) by setting stream to true.

Request parameters

model
string
required
The ID of the model to use for this request. Use the list models endpoint to retrieve available model IDs.
messages
array
required
An array of message objects that make up the conversation. Each object must include a role ("system", "user", or "assistant") and a content string.
stream
boolean
default:"false"
When true, the response is streamed as server-sent events. Each event contains a partial ChatCompletionChunk object. The stream terminates with data: [DONE].
temperature
number
default:"1"
Sampling temperature between 0 and 2. Lower values produce more deterministic output; higher values produce more varied output. Avoid setting both temperature and top_p at the same time.
max_tokens
integer
The maximum number of tokens to generate in the response. The model will stop once this limit is reached, even if the response is incomplete.
top_p
number
Nucleus sampling parameter. The model considers only the tokens comprising the top top_p probability mass. A value of 0.1 means only the top 10% of probability mass is considered.
n
integer
default:"1"
How many independent completion choices to generate for each request. Each choice is billed separately.
stop
string | string[]
One or more sequences at which the model will stop generating further tokens. The stop sequence itself is not included in the response.

Response fields

id
string
A unique identifier for this completion, prefixed with chatcmpl-.
object
string
Always "chat.completion".
created
integer
Unix timestamp (seconds) of when the completion was created.
model
string
The model that was used to generate the response.
choices
array
An array of completion choices. Contains n items when n is set.
usage
object
Token usage for the request.

Example

Request

curl https://tokmodel.com/v1/chat/completions \
  --request POST \
  --header "Authorization: Bearer YOUR_API_KEY" \
  --header "Content-Type: application/json" \
  --data '{
    "model": "openai/gpt-4o",
    "messages": [
      { "role": "system", "content": "You are a helpful assistant." },
      { "role": "user", "content": "What is the capital of France?" }
    ],
    "temperature": 0.7,
    "max_tokens": 256
  }'

Response

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716470400,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 26,
    "completion_tokens": 9,
    "total_tokens": 35
  }
}