Overview
This API provides speech-to-text transcription and translation services using OpenAI’s Whisper V3 model with advanced features like voice activity detection (VAD), speaker diarization, and hallucination reduction.Endpoint
Authentication
Include Bearer token in headers:Request Parameters
Required Parameters
audio_file
(string): Audio input - either Base64-encoded audio file data or publicly accessible audio URL.
Core Processing Parameters
-
language
(string): Source language code. (e.g., “hi” for Hindi, “en” for English,None
for auto-detection) -
task
(string): Processing task - transcribe (speech-to-text in same language) or translate (translate to English)Voice Activity Detection (VAD) ParametersHere is the list of Supported Language in the Whisper Model. -
vad_model
(string): VAD model to use - “silero
” (recommended for accuracy and speed) -
vad_onset
(float): Threshold for speech start detection (0.0-1.0, default: 0.5) -
vad_offset
(float): Threshold for speech end detection (0.0-1.0, default: 0.3)Need help with VAD parameter tuning or Whisper troubleshooting?
Check our detailed guides on VAD tuning and Whisper troubleshooting.
Timestamp Parameters
word_timestamps
(boolean): Enable word-level timestamps in output (true
/false
).without_timestamps
(boolean): Exclude timestamps from transcription text. Only sample text tokens from whisper model. (true
/false
).
Speaker Identification
diarization
(boolean): Enable speaker Diarization to identify different speakers (true/false)
Hallucination Reduction
strict_hallucination_reduction
(boolean): Apply post-processing filters to remove repeated phrases in the transcription (true/false)
Audio Format Support
- MP3, WAV, M4A, AAC
- URLs must be publicly accessible or a pre-signed URL.
Example Request
Option 1: Base64-encoded audio file
Option 2: Audio URL
Response Format
Successful Response (200)
Error Response (4xx)
Error Response (5xx)
Error Codes
- 400: Invalid request format or parameters
- 401: Authentication failed
- 413: File too large
- 422: Unsupported audio format
- 500: Internal server error