Audio speech, transcription, and translation

POST /v1/audio/transcriptions

Transcribe an audio file into text. The audio file is uploaded as a multipart form field. The returned text is in the language spoken in the audio file unless you override it with the language parameter.

Request parameters

file

required

The audio file to transcribe, uploaded via multipart/form-data. Supported formats: flac, mp3, mp4, mpeg, mpga, m4a, ogg, wav, webm.

model

string

required

The transcription model to use. Use the list models endpoint for available options (for example, openai/whisper-1).

language

string

The language of the audio in ISO-639-1 format (for example, "en", "fr", "de"). Providing this improves accuracy and speed.

response_format

string

default:"json"

The format of the transcription output. One of "json", "text", "srt", "verbose_json", or "vtt".

temperature

number

default:"0"

Sampling temperature between 0 and 1. Lower values produce more accurate, deterministic transcriptions.

Example

curl https://tokmodel.com/v1/audio/transcriptions \
  --request POST \
  --header "Authorization: Bearer YOUR_API_KEY" \
  --form "file=@/path/to/recording.mp3" \
  --form "model=openai/whisper-1" \
  --form "language=en" \
  --form "response_format=json"

Response

{
  "text": "The quick brown fox jumps over the lazy dog."
}

Endpoints

Audio speech, transcription, and translation

POST /v1/audio/speech

Request parameters

Example

POST /v1/audio/transcriptions

Request parameters

Example

Response

POST /v1/audio/translations

Request parameters

Example

Response

Endpoints

Documentation Index

​POST /v1/audio/speech

​Request parameters

​Example

​POST /v1/audio/transcriptions

​Request parameters

​Example

​Response

​POST /v1/audio/translations

​Request parameters

​Example

​Response

POST /v1/audio/speech

Request parameters

Example

POST /v1/audio/transcriptions

Request parameters

Example

Response

POST /v1/audio/translations

Request parameters

Example

Response