Speech-to-Text API
Accurate. Fast. Affordable.
Transcribe audio files in 99+ languages with state-of-the-art accuracy. Speaker diarization, word-level timestamps, and translation built in.
By SpeakEasy Team · Last updated
Supported audio formats
Up to 25MB via upload · Up to 1GB via URL
Code examples
OpenAI-compatible. Use your existing SDK — just change the base URL.
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@interview.mp3" \
-F "response_format=verbose_json" \
-F "speaker_labels=true"from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://api.tryspeakeasy.io/v1"
)
transcript = client.audio.transcriptions.create(
model="whisper-large-v3",
file=open("meeting.mp3", "rb")
)
print(transcript.text)const formData = new FormData();
formData.append('file', audioFile);
formData.append('language', 'en');
const res = await fetch('https://api.tryspeakeasy.io/v1/audio/transcriptions', {
method: 'POST',
headers: { Authorization: `Bearer ${'{'}apiKey{'}'}` },
body: formData,
});
const { text } = await res.json();Built for production
99+ Languages
Automatic language detection or specify the language for faster processing. Supports English, Chinese, Spanish, French, German, Japanese, and 95+ more.
Speaker Diarization
Identify who said what. Our API labels up to 4 speakers automatically, perfect for meetings, interviews, and podcasts.
Word Timestamps
Get precise start and end times for each word. Essential for subtitle generation, search indexing, and audio editing.
Translation
Transcribe audio in any language and translate the output to English in a single API call.
Multiple Formats
Get results as plain text, JSON, verbose JSON with timestamps, SRT subtitles, or WebVTT.
Async Processing
For long files, provide a callback URL and we'll POST the results when processing is complete.
How SpeakEasy Compares
Same Whisper large-v3 accuracy. A fraction of the cost.
| Provider | Price / hour | Diarization | OpenAI SDK |
|---|---|---|---|
| SpeakEasy | $0.20 | ✓ | ✓ |
| OpenAI Whisper | $0.36 | ✗ | ✓ |
| AssemblyAI | $0.37 | ✓ | ✗ |
| Deepgram | $0.25 | ✓ | ✗ |
| Google Cloud STT | $0.96+ | ✓ | ✗ |
All providers use equivalent Whisper large-v3 accuracy. Pricing as of April 2026.
Speech-to-Text Pricing
Just $0.20/hour (plan rate) of audio. No hidden fees.
$10/month includes 50 hours. Additional usage at $0.25 per hour. $1 first month.
View full pricing →Frequently asked questions
What audio formats are supported?
How accurate is the transcription?
Do you support speaker diarization?
Can I translate audio to English?
Is there a file size limit?
How fast is the transcription?
$1. 50 hours. Both STT and TTS.
Your current speech API provider is charging you too much. Switch in one line of code.