OpenAI & ElevenLabs Compatible

Text-to-Speech API
Natural. Fast. Affordable.

Turn text into lifelike speech with 54 AI voices in 9 languages. Streaming support, word-level timestamps, and up to 90% savings vs ElevenLabs.

Get API Key →View Docs

Hear it for yourself

Three preset voices, three languages. Tap play to hear raw TTS output from the same API you'll call in production.

Sarah

English (American)

“Speech-to-Text and Text-to-Speech, same SDK as OpenAI, for a flat ten dollars a month.”

Lewis

English (British)

“Switch from ElevenLabs in two lines of code. Keep your existing integration, cut the bill by ninety percent.”

Dora

Spanish

“Cincuenta y cuatro voces en nueve idiomas. Mismo SDK de OpenAI. Diez dólares al mes.”

Samples generated with the SpeakEasy TTS API at 24 kHz.

Code examples

OpenAI-compatible. Use your existing SDK — just change the base URL.

cURL

curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, welcome to SpeakEasy!",
    "voice": "heart"
  }' \
  --output speech.mp3

Python (OpenAI SDK)

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://www.tryspeakeasy.io/api/v1"
)

response = client.audio.speech.create(
    model="tts-1",
    voice="heart",
    input="Hello, welcome to SpeakEasy!"
)
response.stream_to_file("speech.mp3")

JavaScript

const res = await fetch('https://www.tryspeakeasy.io/api/v1/audio/speech', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${'{'}apiKey{'}'}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    input: 'Hello, welcome to SpeakEasy!',
    voice: 'heart',
  }),
});
const audioBlob = await res.blob();

What is the SpeakEasy Text-to-Speech API?

SpeakEasy is a text-to-speech API that turns strings of text into natural-sounding MP3, Opus, AAC, FLAC, PCM, OGG, or WAV audio using 54 AI voices in 9 languages. Pricing works out to roughly $0.15 per 1,000 characterson the entry plan — about 90% cheaper than ElevenLabs ($0.18/1K on their Starter tier) and still undercuts AWS Polly Neural ($16/1M = $0.016/1K) once you factor in AWS's per-request billing overhead and the cost of stitching together a production pipeline.

Drop-in compatible with OpenAI's voice names like alloy, nova, echo, onyx, and fable, plus dozens of additional voices across 9 languages. Existing code written against client.audio.speech.create() switches by swapping base_url. No new SDK, no rewrite.

How the API works

Send a JSON POST to /api/v1/audio/speech with input (your text, up to 4,096 characters), voice (any of 54 names), and optionally response_format (mp3 default, or opus / aac / flac / pcm / ogg / wav) and speed (0.5x–4.0x). We return the raw audio bytes.

For real-time apps — voice assistants, chatbots, IVR — set stream=true to receive audio chunks as they're generated. Time-to-first-byte is typically under 400 ms, so a user hears the first word before the full sentence finishes synthesizing.

Request word-level timestamps to sync playback with on-screen text — useful for karaoke UIs, animated captions, and accessibility tools that highlight the current word as it's spoken. Timestamps come back in the same millisecond-precision format the STT endpoint returns.

Limits: 4,096 characters per request, 10 concurrent requests per account on default plans, and the same API-key auth as every other SpeakEasy endpoint. For long-form content (audiobooks, narration), chunk text on sentence boundaries and stream the results into your player.

Available voices

54 voices across 9 languages

English (American)

Heart

Female

Bella

Female

Michael

Male

Alloy

Female

Aoede

Female

Kore

Female

Jessica

Female

Nicole

Female

Nova

Female

River

Female

Sarah

Female

Sky

Female

Echo

Male

Eric

Male

Fenrir

Male

Liam

Male

Onyx

Male

Puck

Male

Adam

Male

Santa

Male

English (British)

Alice

Female

Emma

Female

Isabella

Female

Lily

Female

Daniel

Male

Fable

Male

George

Male

Lewis

Male

Japanesebeta

Sakura

Female

Gongitsune

Female

Nezumi

Female

Tebukuro

Female

Kumo

Male

Mandarin Chinesebeta

Xiaobei

Female

Xiaoni

Female

Xiaoxiao

Female

Xiaoyi

Female

Yunjian

Male

Yunxi

Male

Yunxia

Male

Yunyang

Male

Spanishbeta

Dora

Female

Alex

Male

Noel

Male

Frenchbeta

Siwis

Female

Hindibeta

Alpha

Female

Beta

Female

Omega

Male

Psi

Male

Italianbeta

Sara

Female

Nicola

Male

Portuguese (Brazil)beta

Clara

Female

Tiago

Male

Papai

Male

Built for developers

Drop-in Compatible

Works with OpenAI and ElevenLabs SDKs. Switch providers by changing one URL.

Real-time Streaming

Stream audio chunks as they're generated for instant playback in voice apps and chatbots.

Word Timestamps

Get precise timing for each word. Perfect for syncing speech with text animations.

Multiple Voices

54 voices across 9 languages with different personalities. Choose the perfect tone for your app.

Multiple Formats

Output in MP3, Opus, AAC, FLAC, PCM, OGG, or WAV. Optimize for quality or file size.

90% Cheaper

Save up to 90% compared to ElevenLabs with comparable voice quality.

What developers build with it

Pay-as-you-go TTS that fits into real production pipelines.

Voice AI & chatbots

Streaming TTS with sub-400 ms time-to-first-byte pairs cleanly with LLM streaming. Users hear speech start almost as soon as the model starts generating.

Audiobook & narration pipelines

Chunk chapters into 4K-character requests, process in parallel (10 concurrent on default plans), stitch the output. A 10-hour audiobook costs roughly $45 at plan rates — vs several hundred on ElevenLabs.

Dubbing & localization

9 languages with native voices covers the bulk of global audiences. Pair with the STT translate=true flag and you have a single-vendor dubbing pipeline.

Accessibility & screen readers

Natural voices and word-level timestamps make it trivial to add spoken-content support to web apps. Highlight the current word as it plays, pause/resume at word boundaries.

E-learning & explainer videos

Generate voiceovers directly from lesson scripts. 54 voices mean you can cast different characters or switch narrators without re-recording. Output MP3 drops straight into Premiere, Final Cut, or ffmpeg.

IVR & notification systems

Dynamic prompts ("Your order #12345 will arrive Thursday") that don't require a voice actor. Per-request latency is low enough for inline phone-tree rendering; per-character cost is low enough to actually deploy.

How SpeakEasy TTS Compares

Pricing per 1,000 characters. Quality comparable across all providers.

Provider	Price / 1K chars	Voices	Streaming
SpeakEasy	~$0.015	54	✓
ElevenLabs	$0.18 (Starter)	1,000+	✓
OpenAI TTS	$0.015	6	✓
AWS Polly Neural	$0.016	~40	✓
Azure Neural TTS	$0.016	400+	✓

Pricing as of April 2026. ElevenLabs Starter: $5/mo for 30K chars = $0.167/1K. Volume discounts vary by provider.

Text-to-Speech Pricing

~3.3 million characters included with your $10/month plan.

Additional usage at $0.25 per additional hour (~67K characters per hour).

View full pricing →

Learn more

SpeakEasy vs OpenAI SpeakEasy vs ElevenLabs SpeakEasy vs AWS Polly SpeakEasy vs Azure Speech TTS API Guide Open Source TTS in 2026 Build an AI Voice Agent API Documentation

Frequently asked questions

What voices are available?

We offer 54 voices across 9 languages, including English (American and British), Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Brazilian Portuguese. All voices are compatible with OpenAI's voice names.

What output formats are supported?

We support MP3, Opus, AAC, FLAC, PCM, OGG, and WAV output formats. MP3 is the default.

Is streaming supported?

Yes! Enable streaming to receive audio chunks in real-time as they're generated. Perfect for real-time voice applications.

How many characters can I convert per request?

Each request supports up to 4,096 characters. For longer content, split it into multiple requests.

Do you provide word-level timestamps?

Yes. Request word-level timestamps to sync generated speech with text for subtitle generation or karaoke-style highlighting.

How does this compare to OpenAI and ElevenLabs?

Our API is compatible with both OpenAI and ElevenLabs formats, so switching is a one-line change. We offer up to 90% savings compared to ElevenLabs with comparable quality.

$1. 50 hours. Both STT and TTS.

Your current speech API provider is charging you too much. Switch in one line of code.

Start for $1 →Read the Docs

Text-to-Speech APINatural. Fast. Affordable.

Hear it for yourself

Code examples

What is the SpeakEasy Text-to-Speech API?

How the API works

Available voices

English (American)

English (British)

Japanesebeta

Mandarin Chinesebeta

Spanishbeta

Frenchbeta

Hindibeta

Italianbeta

Portuguese (Brazil)beta

Built for developers

Drop-in Compatible

Real-time Streaming

Word Timestamps

Multiple Voices

Multiple Formats

90% Cheaper

What developers build with it

Voice AI & chatbots

Audiobook & narration pipelines

Dubbing & localization

Accessibility & screen readers

E-learning & explainer videos

IVR & notification systems

How SpeakEasy TTS Compares

Text-to-Speech Pricing

Learn more

Frequently asked questions

$1. 50 hours. Both STT and TTS.

Text-to-Speech API
Natural. Fast. Affordable.