Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.tokmodel.com/llms.txt

Use this file to discover all available pages before exploring further.

TokModel provides three audio endpoints that follow the OpenAI Audio API shape: synthesize speech from text, transcribe an audio file into the spoken language, and translate spoken audio into English. You can switch between audio model providers by changing the model parameter.

Authentication

Include your API key in every request:
Authorization: Bearer YOUR_API_KEY

Convert text to speech

POST /v1/audio/speech generates an audio file from a text string. The response body is raw audio binary — write it directly to a file.
curl
curl https://tokmodel.com/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/tts-1",
    "input": "Welcome to TokModel. Your unified gateway to over thirty AI model providers.",
    "voice": "nova"
  }' \
  --output speech.mp3
The --output flag tells curl to save the binary response to speech.mp3 instead of printing it to the terminal.

Key parameters

ParameterTypeDescription
modelstringThe TTS model to use, e.g. openai/tts-1 or openai/tts-1-hd.
inputstringThe text to synthesize. Maximum 4096 characters.
voicestringVoice style. Options include alloy, echo, fable, onyx, nova, shimmer.
response_formatstringAudio format: mp3 (default), opus, aac, or flac.
speednumberPlayback speed from 0.25 to 4.0. Default is 1.0.
Use openai/tts-1-hd for higher audio fidelity. It costs more per character but produces noticeably cleaner output, especially for longer texts.