Generate text embeddings using TokModel's gateway API

The /v1/embeddings endpoint converts text into numerical vectors that capture semantic meaning. You can embed a single string or a batch of strings in one request. The resulting vectors are suitable for semantic search, retrieval-augmented generation (RAG), clustering, and classification tasks.

Authentication

Include your API key in every request:

Authorization: Bearer YOUR_API_KEY

Send an embeddings request

Provide an input (a string or array of strings) and a model. TokModel routes the request to the specified embedding model and returns one vector per input item.

curl https://tokmodel.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "The quick brown fox jumps over the lazy dog."
  }'

Embed multiple inputs in one request

Pass an array of strings to input to embed a batch in a single API call. The response contains one entry per input, in the same order.

curl https://tokmodel.com/v1/embeddings \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": [
      "How do I reset my password?",
      "Where can I find my invoices?",
      "How do I cancel my subscription?"
    ]
  }'

Example response

Each object in the data array corresponds to one input string. The embedding field contains the raw float vector.

{
  "object": "list",
  "data": [
    {
      "object": "embedding",
      "index": 0,
      "embedding": [
        0.0023064255,
        -0.009327292,
        0.015797347,
        "..."
      ]
    }
  ],
  "model": "openai/text-embedding-3-small",
  "usage": {
    "prompt_tokens": 10,
    "total_tokens": 10
  }
}

The embedding array above is truncated for readability. Real vectors typically contain 512 to 3072 dimensions depending on the model.

Compute cosine similarity

After embedding two pieces of text, compare them with cosine similarity. A score close to 1.0 means the texts are semantically similar.

python

import numpy as np

def cosine_similarity(a, b):
    return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

response = client.embeddings.create(
    model="openai/text-embedding-3-small",
    input=[
        "How do I reset my password?",
        "I forgot my password, what should I do?",
    ],
)

vec_a = response.data[0].embedding
vec_b = response.data[1].embedding

score = cosine_similarity(vec_a, vec_b)
print(f"Similarity: {score:.4f}")  # e.g. 0.9312

Common use cases

Semantic search — embed a user query and compare it against pre-embedded documents to find the most relevant results, even when the exact words differ. Retrieval-augmented generation (RAG) — embed your knowledge base and retrieve the top-k matching chunks before passing them to a chat model as context. Clustering — group similar documents together by clustering their vectors using algorithms like k-means without any labeled training data. Classification — train a lightweight classifier on top of embeddings to categorize text into predefined labels.

Get Started

Core Concepts

Guides

Account

Generate text embeddings using TokModel's gateway API

Authentication

Send an embeddings request

Embed multiple inputs in one request

Example response

Compute cosine similarity

Common use cases

Get Started

Core Concepts

Guides

Account

Documentation Index

​Authentication

​Send an embeddings request

​Embed multiple inputs in one request

​Example response

​Compute cosine similarity

​Common use cases

Authentication

Send an embeddings request

Embed multiple inputs in one request

Example response

Compute cosine similarity

Common use cases