·SpeakEasy

Translate Audio to English in One API Call

Pass audio in any of 99+ languages, get back English text. No intermediate translation step, no extra API, no extra cost. Just set translate=true.

speech-to-texttranslationmultilingualtutorial

Building a product that handles audio from users around the world? The usual path is: transcribe in the source language, then call a translation API, then process the English text. That's two API calls, two billing accounts, and latency on top of latency.

SpeakEasy collapses it to one. Set translate=true on any transcription request and you get English output, regardless of what language was spoken.

Basic usage

curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@spanish-meeting.mp3" \
  -F "translate=true"

Response:

{
  "text": "Good morning everyone. Today we're going to review the quarterly results.",
  "duration": 45.2,
  "language": "es"
}

The language field tells you what was detected. The text is always English.

Supported languages

Translation works from any of Whisper's 99+ supported languages, including:

  • Spanish, French, German, Italian, Portuguese
  • Japanese, Chinese (Simplified + Traditional), Korean
  • Arabic, Hebrew, Hindi, Bengali
  • Russian, Ukrainian, Polish, Czech
  • Turkish, Indonesian, Vietnamese, Thai
  • And 90+ more

Language detection is automatic — you don't need to specify the source language, though you can with the language parameter to improve accuracy on shorter clips.

Python example

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_SPEAKEASY_KEY",
    base_url="https://api.tryspeakeasy.io/v1"
)

def transcribe_and_translate(audio_path: str) -> dict:
    with open(audio_path, "rb") as f:
        # Use verbose_json to get language detection + segments
        result = client.audio.transcriptions.create(
            model="whisper-large-v3",
            file=f,
            response_format="verbose_json",
            extra_body={"translate": True},
        )
    return {
        "english_text": result.text,
        "source_language": result.language,
        "duration_seconds": result.duration,
    }

# Works with any language
data = transcribe_and_translate("interview-in-mandarin.mp3")
print(f"Detected language: {data['source_language']}")
print(f"English text: {data['english_text']}")

Getting English subtitles from foreign-language video

Combine translate=true with response_format=srt to generate English subtitles from any video:

curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@french-documentary.mp4" \
  -F "translate=true" \
  -F "response_format=srt" \
  > english-subtitles.srt

One call. English SRT file. No external translation API.

Real-world use cases

Customer support transcription: Your support team speaks English but customers call in multiple languages. Translate on ingest and your agents see English transcripts without extra tooling.

import httpx

def transcribe_support_call(audio_url: str, ticket_id: str) -> str:
    response = httpx.post(
        "https://api.tryspeakeasy.io/v1/audio/transcriptions",
        headers={"Authorization": "Bearer YOUR_API_KEY"},
        data={
            "url": audio_url,
            "translate": "true",
            "callback_url": f"https://yourapp.com/webhooks/transcripts/{ticket_id}",
        },
        timeout=30,
    )
    response.raise_for_status()
    return response.json()

Multilingual podcast transcription: Transcribe and translate episodes from international podcasters for English-speaking audiences.

Video localization pipeline: Get English text first, then hand it to your localization team to adapt and re-record — skipping an intermediate translation round trip.

Meeting notes for global teams: Participants join in their native language, the transcript arrives in English.

How it differs from separate transcribe + translate

The translate=true approach uses Whisper's built-in translation capability, which was trained end-to-end on speech-to-English pairs. This means:

  • No error accumulation — the model never produces an intermediate transcript that a second model then translates
  • Better handling of idiomatic speech that would confuse a text-based translation model
  • Single API call, single latency budget, single billing line

For production accuracy on specific language pairs, compare results against a dedicated translation API on your data — for most use cases, Whisper's translation is more than sufficient and significantly cheaper.

Async translation for long files

For long recordings, use callback_url to avoid holding a connection open:

curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "url=https://your-storage.com/lecture-in-german.mp3" \
  -F "translate=true" \
  -F "callback_url=https://yourapp.com/webhooks/transcript"

The webhook payload will contain the English transcript when processing completes.


Start translating today — $1 for your first month, 50 hours included. 99+ languages supported.

SPEAKY