OpenAI & ElevenLabs Compatible

Text-to-Speech API
Natural. Fast. Affordable.

Turn text into lifelike speech with 54 AI voices in 9 languages. Streaming support, word-level timestamps, and up to 90% savings vs ElevenLabs.

Hear it for yourself

Three preset voices, three languages. Tap play to hear raw TTS output from the same API you'll call in production.

S

Sarah

English (American)

Speech-to-Text and Text-to-Speech, same SDK as OpenAI, for a flat ten dollars a month.

L

Lewis

English (British)

Switch from ElevenLabs in two lines of code. Keep your existing integration, cut the bill by ninety percent.

D

Dora

Spanish

Cincuenta y cuatro voces en nueve idiomas. Mismo SDK de OpenAI. Diez dólares al mes.

Samples generated with the SpeakEasy TTS API at 24 kHz.

Code examples

OpenAI-compatible. Use your existing SDK — just change the base URL.

cURL
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, welcome to SpeakEasy!",
    "voice": "heart"
  }' \
  --output speech.mp3
Python (OpenAI SDK)
from openai import OpenAI

client = OpenAI(
    api_key="YOUR_API_KEY",
    base_url="https://www.tryspeakeasy.io/api/v1"
)

response = client.audio.speech.create(
    model="tts-1",
    voice="heart",
    input="Hello, welcome to SpeakEasy!"
)
response.stream_to_file("speech.mp3")
JavaScript
const res = await fetch('https://www.tryspeakeasy.io/api/v1/audio/speech', {
  method: 'POST',
  headers: {
    Authorization: `Bearer ${'{'}apiKey{'}'}`,
    'Content-Type': 'application/json',
  },
  body: JSON.stringify({
    input: 'Hello, welcome to SpeakEasy!',
    voice: 'heart',
  }),
});
const audioBlob = await res.blob();

What is the SpeakEasy Text-to-Speech API?

SpeakEasy is a text-to-speech API that turns strings of text into natural-sounding MP3, Opus, AAC, FLAC, PCM, OGG, or WAV audio using 54 AI voices in 9 languages. Pricing works out to roughly $0.15 per 1,000 characterson the entry plan — about 90% cheaper than ElevenLabs ($0.18/1K on their Starter tier) and still undercuts AWS Polly Neural ($16/1M = $0.016/1K) once you factor in AWS's per-request billing overhead and the cost of stitching together a production pipeline.

Drop-in compatible with OpenAI's voice names like alloy, nova, echo, onyx, and fable, plus dozens of additional voices across 9 languages. Existing code written against client.audio.speech.create() switches by swapping base_url. No new SDK, no rewrite.

How the API works

Send a JSON POST to /api/v1/audio/speech with input (your text, up to 4,096 characters), voice (any of 54 names), and optionally response_format (mp3 default, or opus / aac / flac / pcm / ogg / wav) and speed (0.5x–4.0x). We return the raw audio bytes.

For real-time apps — voice assistants, chatbots, IVR — set stream=true to receive audio chunks as they're generated. Time-to-first-byte is typically under 400 ms, so a user hears the first word before the full sentence finishes synthesizing.

Request word-level timestamps to sync playback with on-screen text — useful for karaoke UIs, animated captions, and accessibility tools that highlight the current word as it's spoken. Timestamps come back in the same millisecond-precision format the STT endpoint returns.

Limits: 4,096 characters per request, 10 concurrent requests per account on default plans, and the same API-key auth as every other SpeakEasy endpoint. For long-form content (audiobooks, narration), chunk text on sentence boundaries and stream the results into your player.

Available voices

54 voices across 9 languages

English (American)

H

Heart

Female

B

Bella

Female

M

Michael

Male

A

Alloy

Female

A

Aoede

Female

K

Kore

Female

J

Jessica

Female

N

Nicole

Female

N

Nova

Female

R

River

Female

S

Sarah

Female

S

Sky

Female

E

Echo

Male

E

Eric

Male

F

Fenrir

Male

L

Liam

Male

O

Onyx

Male

P

Puck

Male

A

Adam

Male

S

Santa

Male

English (British)

A

Alice

Female

E

Emma

Female

I

Isabella

Female

L

Lily

Female

D

Daniel

Male

F

Fable

Male

G

George

Male

L

Lewis

Male

Japanesebeta

S

Sakura

Female

G

Gongitsune

Female

N

Nezumi

Female

T

Tebukuro

Female

K

Kumo

Male

Mandarin Chinesebeta

X

Xiaobei

Female

X

Xiaoni

Female

X

Xiaoxiao

Female

X

Xiaoyi

Female

Y

Yunjian

Male

Y

Yunxi

Male

Y

Yunxia

Male

Y

Yunyang

Male

Spanishbeta

D

Dora

Female

A

Alex

Male

N

Noel

Male

Frenchbeta

S

Siwis

Female

Hindibeta

A

Alpha

Female

B

Beta

Female

O

Omega

Male

P

Psi

Male

Italianbeta

S

Sara

Female

N

Nicola

Male

Portuguese (Brazil)beta

C

Clara

Female

T

Tiago

Male

P

Papai

Male

Built for developers

Drop-in Compatible

Works with OpenAI and ElevenLabs SDKs. Switch providers by changing one URL.

Real-time Streaming

Stream audio chunks as they're generated for instant playback in voice apps and chatbots.

Word Timestamps

Get precise timing for each word. Perfect for syncing speech with text animations.

Multiple Voices

54 voices across 9 languages with different personalities. Choose the perfect tone for your app.

Multiple Formats

Output in MP3, Opus, AAC, FLAC, PCM, OGG, or WAV. Optimize for quality or file size.

90% Cheaper

Save up to 90% compared to ElevenLabs with comparable voice quality.

What developers build with it

Pay-as-you-go TTS that fits into real production pipelines.

Voice AI & chatbots

Streaming TTS with sub-400 ms time-to-first-byte pairs cleanly with LLM streaming. Users hear speech start almost as soon as the model starts generating.

Audiobook & narration pipelines

Chunk chapters into 4K-character requests, process in parallel (10 concurrent on default plans), stitch the output. A 10-hour audiobook costs roughly $45 at plan rates — vs several hundred on ElevenLabs.

Dubbing & localization

9 languages with native voices covers the bulk of global audiences. Pair with the STT translate=true flag and you have a single-vendor dubbing pipeline.

Accessibility & screen readers

Natural voices and word-level timestamps make it trivial to add spoken-content support to web apps. Highlight the current word as it plays, pause/resume at word boundaries.

E-learning & explainer videos

Generate voiceovers directly from lesson scripts. 54 voices mean you can cast different characters or switch narrators without re-recording. Output MP3 drops straight into Premiere, Final Cut, or ffmpeg.

IVR & notification systems

Dynamic prompts ("Your order #12345 will arrive Thursday") that don't require a voice actor. Per-request latency is low enough for inline phone-tree rendering; per-character cost is low enough to actually deploy.

How SpeakEasy TTS Compares

Pricing per 1,000 characters. Quality comparable across all providers.

ProviderPrice / 1K charsVoicesStreaming
SpeakEasy~$0.01554
ElevenLabs$0.18 (Starter)1,000+
OpenAI TTS$0.0156
AWS Polly Neural$0.016~40
Azure Neural TTS$0.016400+

Pricing as of April 2026. ElevenLabs Starter: $5/mo for 30K chars = $0.167/1K. Volume discounts vary by provider.

Text-to-Speech Pricing

~3.3 million characters included with your $10/month plan.

Additional usage at $0.25 per additional hour (~67K characters per hour).

View full pricing →

Frequently asked questions

What voices are available?
We offer 54 voices across 9 languages, including English (American and British), Japanese, Mandarin Chinese, Spanish, French, Hindi, Italian, and Brazilian Portuguese. All voices are compatible with OpenAI's voice names.
What output formats are supported?
We support MP3, Opus, AAC, FLAC, PCM, OGG, and WAV output formats. MP3 is the default.
Is streaming supported?
Yes! Enable streaming to receive audio chunks in real-time as they're generated. Perfect for real-time voice applications.
How many characters can I convert per request?
Each request supports up to 4,096 characters. For longer content, split it into multiple requests.
Do you provide word-level timestamps?
Yes. Request word-level timestamps to sync generated speech with text for subtitle generation or karaoke-style highlighting.
How does this compare to OpenAI and ElevenLabs?
Our API is compatible with both OpenAI and ElevenLabs formats, so switching is a one-line change. We offer up to 90% savings compared to ElevenLabs with comparable quality.

$1. 50 hours. Both STT and TTS.

Your current speech API provider is charging you too much. Switch in one line of code.