Text-to-Speech API
Natural. Fast. Affordable.
Turn text into lifelike speech with 54 AI voices in 9 languages. Streaming support, word-level timestamps, and up to 90% savings vs ElevenLabs.
Hear it for yourself
Three preset voices, three languages. Tap play to hear raw TTS output from the same API you'll call in production.
Sarah
English (American)
“Speech-to-Text and Text-to-Speech, same SDK as OpenAI, for a flat ten dollars a month.”
Lewis
English (British)
“Switch from ElevenLabs in two lines of code. Keep your existing integration, cut the bill by ninety percent.”
Dora
Spanish
“Cincuenta y cuatro voces en nueve idiomas. Mismo SDK de OpenAI. Diez dólares al mes.”
Samples generated with the SpeakEasy TTS API at 24 kHz.
Code examples
OpenAI-compatible. Use your existing SDK — just change the base URL.
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello, welcome to SpeakEasy!",
"voice": "heart"
}' \
--output speech.mp3from openai import OpenAI
client = OpenAI(
api_key="YOUR_API_KEY",
base_url="https://www.tryspeakeasy.io/api/v1"
)
response = client.audio.speech.create(
model="tts-1",
voice="heart",
input="Hello, welcome to SpeakEasy!"
)
response.stream_to_file("speech.mp3")const res = await fetch('https://www.tryspeakeasy.io/api/v1/audio/speech', {
method: 'POST',
headers: {
Authorization: `Bearer ${'{'}apiKey{'}'}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({
input: 'Hello, welcome to SpeakEasy!',
voice: 'heart',
}),
});
const audioBlob = await res.blob();What is the SpeakEasy Text-to-Speech API?
SpeakEasy is a text-to-speech API that turns strings of text into natural-sounding MP3, Opus, AAC, FLAC, PCM, OGG, or WAV audio using 54 AI voices in 9 languages. Pricing works out to roughly $0.15 per 1,000 characterson the entry plan — about 90% cheaper than ElevenLabs ($0.18/1K on their Starter tier) and still undercuts AWS Polly Neural ($16/1M = $0.016/1K) once you factor in AWS's per-request billing overhead and the cost of stitching together a production pipeline.
Drop-in compatible with OpenAI's voice names like alloy, nova, echo, onyx, and fable, plus dozens of additional voices across 9 languages. Existing code written against client.audio.speech.create() switches by swapping base_url. No new SDK, no rewrite.
How the API works
Send a JSON POST to /api/v1/audio/speech with input (your text, up to 4,096 characters), voice (any of 54 names), and optionally response_format (mp3 default, or opus / aac / flac / pcm / ogg / wav) and speed (0.5x–4.0x). We return the raw audio bytes.
For real-time apps — voice assistants, chatbots, IVR — set stream=true to receive audio chunks as they're generated. Time-to-first-byte is typically under 400 ms, so a user hears the first word before the full sentence finishes synthesizing.
Request word-level timestamps to sync playback with on-screen text — useful for karaoke UIs, animated captions, and accessibility tools that highlight the current word as it's spoken. Timestamps come back in the same millisecond-precision format the STT endpoint returns.
Limits: 4,096 characters per request, 10 concurrent requests per account on default plans, and the same API-key auth as every other SpeakEasy endpoint. For long-form content (audiobooks, narration), chunk text on sentence boundaries and stream the results into your player.
Available voices
54 voices across 9 languages
English (American)
Heart
Female
Bella
Female
Michael
Male
Alloy
Female
Aoede
Female
Kore
Female
Jessica
Female
Nicole
Female
Nova
Female
River
Female
Sarah
Female
Sky
Female
Echo
Male
Eric
Male
Fenrir
Male
Liam
Male
Onyx
Male
Puck
Male
Adam
Male
Santa
Male
English (British)
Alice
Female
Emma
Female
Isabella
Female
Lily
Female
Daniel
Male
Fable
Male
George
Male
Lewis
Male
Japanesebeta
Sakura
Female
Gongitsune
Female
Nezumi
Female
Tebukuro
Female
Kumo
Male
Mandarin Chinesebeta
Xiaobei
Female
Xiaoni
Female
Xiaoxiao
Female
Xiaoyi
Female
Yunjian
Male
Yunxi
Male
Yunxia
Male
Yunyang
Male
Spanishbeta
Dora
Female
Alex
Male
Noel
Male
Frenchbeta
Siwis
Female
Hindibeta
Alpha
Female
Beta
Female
Omega
Male
Psi
Male
Italianbeta
Sara
Female
Nicola
Male
Portuguese (Brazil)beta
Clara
Female
Tiago
Male
Papai
Male
Built for developers
Drop-in Compatible
Works with OpenAI and ElevenLabs SDKs. Switch providers by changing one URL.
Real-time Streaming
Stream audio chunks as they're generated for instant playback in voice apps and chatbots.
Word Timestamps
Get precise timing for each word. Perfect for syncing speech with text animations.
Multiple Voices
54 voices across 9 languages with different personalities. Choose the perfect tone for your app.
Multiple Formats
Output in MP3, Opus, AAC, FLAC, PCM, OGG, or WAV. Optimize for quality or file size.
90% Cheaper
Save up to 90% compared to ElevenLabs with comparable voice quality.
What developers build with it
Pay-as-you-go TTS that fits into real production pipelines.
Voice AI & chatbots
Streaming TTS with sub-400 ms time-to-first-byte pairs cleanly with LLM streaming. Users hear speech start almost as soon as the model starts generating.
Audiobook & narration pipelines
Chunk chapters into 4K-character requests, process in parallel (10 concurrent on default plans), stitch the output. A 10-hour audiobook costs roughly $45 at plan rates — vs several hundred on ElevenLabs.
Dubbing & localization
9 languages with native voices covers the bulk of global audiences. Pair with the STT translate=true flag and you have a single-vendor dubbing pipeline.
Accessibility & screen readers
Natural voices and word-level timestamps make it trivial to add spoken-content support to web apps. Highlight the current word as it plays, pause/resume at word boundaries.
E-learning & explainer videos
Generate voiceovers directly from lesson scripts. 54 voices mean you can cast different characters or switch narrators without re-recording. Output MP3 drops straight into Premiere, Final Cut, or ffmpeg.
IVR & notification systems
Dynamic prompts ("Your order #12345 will arrive Thursday") that don't require a voice actor. Per-request latency is low enough for inline phone-tree rendering; per-character cost is low enough to actually deploy.
How SpeakEasy TTS Compares
Pricing per 1,000 characters. Quality comparable across all providers.
| Provider | Price / 1K chars | Voices | Streaming |
|---|---|---|---|
| SpeakEasy | ~$0.015 | 54 | ✓ |
| ElevenLabs | $0.18 (Starter) | 1,000+ | ✓ |
| OpenAI TTS | $0.015 | 6 | ✓ |
| AWS Polly Neural | $0.016 | ~40 | ✓ |
| Azure Neural TTS | $0.016 | 400+ | ✓ |
Pricing as of April 2026. ElevenLabs Starter: $5/mo for 30K chars = $0.167/1K. Volume discounts vary by provider.
Text-to-Speech Pricing
~3.3 million characters included with your $10/month plan.
Additional usage at $0.25 per additional hour (~67K characters per hour).
View full pricing →Frequently asked questions
What voices are available?
What output formats are supported?
Is streaming supported?
How many characters can I convert per request?
Do you provide word-level timestamps?
How does this compare to OpenAI and ElevenLabs?
$1. 50 hours. Both STT and TTS.
Your current speech API provider is charging you too much. Switch in one line of code.