Send chat completions using TokModel's unified LLM API

The /v1/chat/completions endpoint is the primary way to generate text and hold multi-turn conversations with any model available through TokModel. Because it follows the OpenAI Chat Completions API shape, you can drop TokModel into any existing OpenAI client by changing only the base_url — no other code changes required.

Authentication

Every request must include your API key in the Authorization header:

Authorization: Bearer YOUR_API_KEY

You can create and manage API keys in the TokModel console.

Send a basic request

The minimum required fields are model and messages. The messages array holds an ordered conversation history, where each entry has a role (system, user, or assistant) and a content string.

curl https://tokmodel.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [
      {"role": "user", "content": "Explain what a transformer model is in two sentences."}
    ]
  }'

Example response

A successful request returns a JSON object. The generated text is in choices[0].message.content.

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1716489600,
  "model": "openai/gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "A transformer model is a neural network architecture that uses self-attention mechanisms to process sequences of data in parallel. It was introduced in the 2017 paper \"Attention Is All You Need\" and has become the foundation for modern large language models like GPT and BERT."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 18,
    "completion_tokens": 52,
    "total_tokens": 70
  }
}

Use a system message

A system message sets the behavior and context for the assistant. Place it as the first entry in the messages array.

python

response = client.chat.completions.create(
    model="openai/gpt-4o",
    messages=[
        {
            "role": "system",
            "content": "You are a concise technical assistant. Reply in plain text only, no markdown."
        },
        {
            "role": "user",
            "content": "What is the difference between supervised and unsupervised learning?"
        }
    ],
)

print(response.choices[0].message.content)

Stream the response

Set stream: true to receive tokens as server-sent events (SSE) instead of waiting for the full response. This reduces perceived latency for long outputs.

curl https://tokmodel.com/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o",
    "stream": true,
    "messages": [
      {"role": "user", "content": "Write a short poem about APIs."}
    ]
  }'

Switch models

Change the model parameter to route your request to a different provider. TokModel uses the format provider/model-name. No other code changes are needed.

python

# Use Anthropic Claude
response = client.chat.completions.create(
    model="anthropic/claude-opus-4-5",
    messages=[{"role": "user", "content": "Summarize the water cycle."}],
)

# Use Google Gemini
response = client.chat.completions.create(
    model="google/gemini-2.0-flash",
    messages=[{"role": "user", "content": "Summarize the water cycle."}],
)

# Use Meta Llama
response = client.chat.completions.create(
    model="meta-llama/llama-3.3-70b-instruct",
    messages=[{"role": "user", "content": "Summarize the water cycle."}],
)

Browse all available models and their provider slugs in the Models reference.

Multi-turn conversation

Build a conversation history by appending each assistant reply to the messages array before sending the next user message.

python

messages = [
    {"role": "system", "content": "You are a helpful coding assistant."}
]

# Turn 1
messages.append({"role": "user", "content": "What does the `zip` function do in Python?"})
response = client.chat.completions.create(model="openai/gpt-4o", messages=messages)
reply = response.choices[0].message.content
messages.append({"role": "assistant", "content": reply})
print(reply)

# Turn 2
messages.append({"role": "user", "content": "Show me an example with two lists."})
response = client.chat.completions.create(model="openai/gpt-4o", messages=messages)
print(response.choices[0].message.content)

​Authentication

​Send a basic request

​Example response

​Use a system message

​Stream the response

​Switch models

​Multi-turn conversation

Authentication

Send a basic request

Example response

Use a system message

Stream the response

Switch models

Multi-turn conversation