Translate Audio to English in One API Call
Pass audio in any of 99+ languages, get back English text. No intermediate translation step, no extra API, no extra cost. Just set translate=true.
Building a product that handles audio from users around the world? The usual path is: transcribe in the source language, then call a translation API, then process the English text. That's two API calls, two billing accounts, and latency on top of latency.
SpeakEasy collapses it to one. Set translate=true on any transcription request and you get English output, regardless of what language was spoken.
Basic usage
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@spanish-meeting.mp3" \
-F "translate=true"
Response:
{
"text": "Good morning everyone. Today we're going to review the quarterly results.",
"duration": 45.2,
"language": "es"
}
The language field tells you what was detected. The text is always English.
Supported languages
Translation works from any of Whisper's 99+ supported languages, including:
- Spanish, French, German, Italian, Portuguese
- Japanese, Chinese (Simplified + Traditional), Korean
- Arabic, Hebrew, Hindi, Bengali
- Russian, Ukrainian, Polish, Czech
- Turkish, Indonesian, Vietnamese, Thai
- And 90+ more
Language detection is automatic — you don't need to specify the source language, though you can with the language parameter to improve accuracy on shorter clips.
Python example
from openai import OpenAI
client = OpenAI(
api_key="YOUR_SPEAKEASY_KEY",
base_url="https://api.tryspeakeasy.io/v1"
)
def transcribe_and_translate(audio_path: str) -> dict:
with open(audio_path, "rb") as f:
# Use verbose_json to get language detection + segments
result = client.audio.transcriptions.create(
model="whisper-large-v3",
file=f,
response_format="verbose_json",
extra_body={"translate": True},
)
return {
"english_text": result.text,
"source_language": result.language,
"duration_seconds": result.duration,
}
# Works with any language
data = transcribe_and_translate("interview-in-mandarin.mp3")
print(f"Detected language: {data['source_language']}")
print(f"English text: {data['english_text']}")
Getting English subtitles from foreign-language video
Combine translate=true with response_format=srt to generate English subtitles from any video:
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@french-documentary.mp4" \
-F "translate=true" \
-F "response_format=srt" \
> english-subtitles.srt
One call. English SRT file. No external translation API.
Real-world use cases
Customer support transcription: Your support team speaks English but customers call in multiple languages. Translate on ingest and your agents see English transcripts without extra tooling.
import httpx
def transcribe_support_call(audio_url: str, ticket_id: str) -> str:
response = httpx.post(
"https://api.tryspeakeasy.io/v1/audio/transcriptions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
data={
"url": audio_url,
"translate": "true",
"callback_url": f"https://yourapp.com/webhooks/transcripts/{ticket_id}",
},
timeout=30,
)
response.raise_for_status()
return response.json()
Multilingual podcast transcription: Transcribe and translate episodes from international podcasters for English-speaking audiences.
Video localization pipeline: Get English text first, then hand it to your localization team to adapt and re-record — skipping an intermediate translation round trip.
Meeting notes for global teams: Participants join in their native language, the transcript arrives in English.
How it differs from separate transcribe + translate
The translate=true approach uses Whisper's built-in translation capability, which was trained end-to-end on speech-to-English pairs. This means:
- No error accumulation — the model never produces an intermediate transcript that a second model then translates
- Better handling of idiomatic speech that would confuse a text-based translation model
- Single API call, single latency budget, single billing line
For production accuracy on specific language pairs, compare results against a dedicated translation API on your data — for most use cases, Whisper's translation is more than sufficient and significantly cheaper.
Async translation for long files
For long recordings, use callback_url to avoid holding a connection open:
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "url=https://your-storage.com/lecture-in-german.mp3" \
-F "translate=true" \
-F "callback_url=https://yourapp.com/webhooks/transcript"
The webhook payload will contain the English transcript when processing completes.
Start translating today — $1 for your first month, 50 hours included. 99+ languages supported.