Whisper V2 API
Process audio files for transcription or translation with advanced options
Authorizations
JWT token for authentication
Body
TEXT FIELD: This is a string field (not a file upload). Provide the audio as a base64-encoded string. First convert your audio file (.mp3, .wav, .flac) to base64, then paste the resulting string here.
Language code (e.g. 'en' for English)
"en"
Task type - transcribe in source language or translate to English
transcribe
, translate
"transcribe"
Optional starting text prompt for context
Number of parallel sequences evaluated
1 <= x <= 5
Number of best sequences considered
1 <= x <= 5
Include word-level timestamps (0=false, 1=true)
0
, 1
Enable speaker diarization (0=false, 1=true)
0
, 1
Enable voice activity detection filter (0=false, 1=true)
0
, 1
Exclude timestamps from output (0=false, 1=true)
0
, 1
Enable streaming output (0=false, 1=true)
0
, 1
Minimum number of speakers to detect
Maximum number of speakers to detect
Number of audio samples processed in one batch
0 <= x <= 24
Penalty for longer sequences
Beam search patience factor
0 <= x <= 1
Minimum duration of silence for a break
Minimum duration for speech detection
Voice activity detection onset threshold
0 <= x <= 1
Voice activity detection offset threshold
0 <= x <= 1
Additional padding at segment end
Additional padding at segment start
Maximum duration to process