Documentation Index
Fetch the complete documentation index at: https://docs.tokmodel.com/llms.txt
Use this file to discover all available pages before exploring further.
The /v1/rerank endpoint takes a query and a list of documents, then returns those documents sorted by relevance score. Reranking is typically used as a second-stage filter after an initial vector search retrieves a broad set of candidate documents. The reranker produces more precise relevance signals than embedding similarity alone, which improves the quality of context passed to a language model.
Authentication
Include your API key in every request:
Authorization: Bearer YOUR_API_KEY
Send a rerank request
Provide a query string, a documents array, and a model. TokModel returns the documents re-ordered from most to least relevant, each annotated with a relevance_score.
curl https://tokmodel.com/v1/rerank \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cohere/rerank-english-v3.0",
"query": "How do I reset my password?",
"documents": [
"You can reset your password from the account settings page.",
"Our API supports OAuth 2.0 and API key authentication.",
"To change your password, go to Settings > Security and click Reset Password.",
"Billing questions can be directed to support@example.com."
]
}'
Example response
The results array is sorted by relevance_score in descending order. The index field refers to the position of the document in the original input array.
{
"id": "rerank-xyz789",
"model": "cohere/rerank-english-v3.0",
"results": [
{
"index": 2,
"relevance_score": 0.9873,
"document": {
"text": "To change your password, go to Settings > Security and click Reset Password."
}
},
{
"index": 0,
"relevance_score": 0.9541,
"document": {
"text": "You can reset your password from the account settings page."
}
},
{
"index": 1,
"relevance_score": 0.1203,
"document": {
"text": "Our API supports OAuth 2.0 and API key authentication."
}
},
{
"index": 3,
"relevance_score": 0.0412,
"document": {
"text": "Billing questions can be directed to support@example.com."
}
}
],
"usage": {
"total_tokens": 98
}
}
Use the relevance_score to decide how many documents to forward to the language model. A common pattern is to keep only results above a threshold (e.g. 0.5) or to take the top-k regardless of score.
Key parameters
| Parameter | Type | Description |
|---|
model | string | The reranking model to use, e.g. cohere/rerank-english-v3.0. |
query | string | The search query to rank documents against. |
documents | array | List of strings (or objects with a text key) to rank. |
top_n | integer | Return only the top N results. Defaults to all documents. |
return_documents | boolean | Include document text in the response. Default true. |
Use reranking in a RAG pipeline
A typical RAG pipeline retrieves more documents than it can fit in the context window, then reranks them to keep only the most relevant ones.
import requests
from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://tokmodel.com/v1",
)
# Step 1: Retrieve candidate documents from your vector store
# (Replace with your actual retrieval logic)
candidate_docs = [
"Password resets are handled via the Security settings panel.",
"Our SLA guarantees 99.9% uptime for all paid plans.",
"To reset your password, visit Settings > Security > Reset Password.",
"You can export your data from the Account > Data Export page.",
"Contact support if you have not received your password reset email.",
]
user_query = "How do I reset my password?"
# Step 2: Rerank the candidates
rerank_response = requests.post(
"https://tokmodel.com/v1/rerank",
headers={
"Authorization": "Bearer YOUR_API_KEY",
"Content-Type": "application/json",
},
json={
"model": "cohere/rerank-english-v3.0",
"query": user_query,
"documents": candidate_docs,
"top_n": 3,
},
)
top_docs = [r["document"]["text"] for r in rerank_response.json()["results"]]
context = "\n\n".join(top_docs)
# Step 3: Generate an answer using only the top-ranked context
chat_response = client.chat.completions.create(
model="openai/gpt-4o",
messages=[
{
"role": "system",
"content": f"Answer the user's question using only the context below.\n\nContext:\n{context}",
},
{"role": "user", "content": user_query},
],
)
print(chat_response.choices[0].message.content)
Retrieve 20–50 candidates from your vector store and pass them to the reranker, then use only the top 3–5 results as context. This pattern consistently outperforms passing raw vector search results directly to the model.