Text-to-Speech API

The Text-to-Speech (TTS) API converts written text into lifelike spoken audio. Choose from 50+ voices across 9 languages, control speed and output format, and optionally stream the audio in real time.

Endpoint

POST https://api.tryspeakeasy.io/v1/audio/speech

Authentication

All requests must include a valid API key in the Authorization header using the Bearer scheme:

Authorization: Bearer YOUR_API_KEY

See the Authentication guide for details on creating and managing API keys.

Request Parameters

Send a JSON body with Content-Type: application/json. The following parameters are supported:

ParameterTypeRequiredDefaultDescription
modelstringYesThe TTS model to use. Options: tts-1 (optimised for speed) or tts-1-hd (optimised for quality).
inputstringYesThe text to generate audio for. Maximum length is 4096 characters.
voicestringYesThe voice to use. 50+ voices available across 9 languages. Default: alloy. See Available Voices for the full list.
response_formatstringNo"mp3"The audio format of the output. Supported formats: mp3, opus, aac, flac, pcm, ogg, wav.
languagestringNoHint the language of the input text. The voice must support the specified language. Valid codes: en-us, en-gb, ja, zh, es, fr, hi, it, pt-br. See Available Voices for supported languages per voice.
speednumberNo1.0The speed of the generated audio. Accepted range: 0.5 to 4.0.
streambooleanNofalseWhen true, audio is returned using chunked transfer encoding as it is generated. See Streaming Response below.
word_timestampsbooleanNofalseWhen true, the response includes a JSON header with per-word timing information before the audio bytes.

Code Examples

curl

curl -X POST https://api.tryspeakeasy.io/v1/audio/speech \
  -H "Authorization: Bearer $SPEAKEASY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "Hello, welcome to SpeakEasy!",
    "voice": "alloy",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Python (OpenAI SDK)

SpeakEasy is fully compatible with the OpenAI Python SDK. Just set the base_url to https://api.tryspeakeasy.io/v1:

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.tryspeakeasy.io/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="alloy",
    input="Hello, welcome to SpeakEasy!",
    response_format="mp3",
    speed=1.0,
)

response.stream_to_file("speech.mp3")

JavaScript

const response = await fetch("https://api.tryspeakeasy.io/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "tts-1",
    input: "Hello, welcome to SpeakEasy!",
    voice: "alloy",
    response_format: "mp3",
    speed: 1.0,
  }),
});

const audioBlob = await response.blob();

// Save to file (Node.js)
const fs = await import("node:fs");
const buffer = Buffer.from(await audioBlob.arrayBuffer());
fs.writeFileSync("speech.mp3", buffer);

Response

On success the API returns the raw binary audio data with a Content-Type header matching the requested format:

FormatContent-Type
mp3audio/mpeg
opusaudio/opus
aacaudio/aac
flacaudio/flac
pcmaudio/pcm
oggaudio/ogg
wavaudio/wav

The response also includes a Content-Length header (except when streaming) so you can display a progress bar or pre-allocate a buffer.

Streaming Response

When stream is set to true, the API returns audio using HTTP chunked transfer encoding. Audio chunks are sent as they are generated, allowing your application to begin playback before the entire response is ready. This is especially useful for long inputs or real-time applications.

The response uses Transfer-Encoding: chunked and the same Content-Type header as a non-streaming response. No Content-Length header is included.

Streaming with curl

curl -X POST https://api.tryspeakeasy.io/v1/audio/speech \
  -H "Authorization: Bearer $SPEAKEASY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "This audio is streamed as it is generated.",
    "voice": "nova",
    "stream": true
  }' \
  --output speech.mp3

Streaming with Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-your-api-key",
    base_url="https://api.tryspeakeasy.io/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="This audio is streamed as it is generated.",
)

# Stream directly to a file
response.stream_to_file("speech.mp3")

Streaming with JavaScript

const response = await fetch("https://api.tryspeakeasy.io/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    model: "tts-1",
    input: "This audio is streamed as it is generated.",
    voice: "nova",
    stream: true,
  }),
});

// Process chunks as they arrive
const reader = response.body.getReader();
const chunks = [];

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  chunks.push(value);
}

const audioBlob = new Blob(chunks, { type: "audio/mpeg" });

Available Voices

SpeakEasy offers 50+ voices across 9 languages. Each voice is available with both the tts-1 and tts-1-hd models.

English (American)

VoiceDescription
heartWarm, expressive — the default voice.
bellaFriendly, approachable female voice.
michaelClear, professional male voice.
alloyNeutral, balanced — versatile for most content.
aoedeMelodic, engaging female voice.
koreYouthful, energetic female voice.
jessicaNatural, conversational female voice.
nicolePolished, articulate female voice.
novaBright, friendly — perfect for upbeat content.
riverSmooth, flowing narration voice.
sarahWarm, trustworthy female voice.
skyLight, airy female voice.
echoWarm, conversational — ideal for dialogue.
ericStrong, confident male voice.
fenrirDeep, powerful male voice.
liamYouthful, energetic male voice.
onyxDeep, authoritative — professional and formal.
puckPlayful, dynamic male voice.
adamSteady, reliable male voice.
santaJolly, festive character voice.

English (British)

Set language to en-gb when using British English voices.

VoiceDescription
aliceRefined, classic British female voice.
emmaModern, friendly British female voice.
isabellaElegant, articulate British female voice.
lilyBright, cheerful British female voice.
danielComposed, professional British male voice.
fableExpressive, dynamic — great for storytelling.
georgeWarm, distinguished British male voice.
lewisClear, authoritative British male voice.

Japanese

VoiceDescription
harutoNatural Japanese male voice.
yukiNatural Japanese female voice.

Chinese (Mandarin)

VoiceDescription
xiaobeiClear Mandarin female voice.
yunjianNatural Mandarin male voice.

Spanish

VoiceDescription
carlosNatural Spanish male voice.
mariaNatural Spanish female voice.

French

VoiceDescription
pierreNatural French male voice.
amelieNatural French female voice.

Hindi

VoiceDescription
arjunNatural Hindi male voice.
priyaNatural Hindi female voice.

Italian

VoiceDescription
lucaNatural Italian male voice.
giuliaNatural Italian female voice.

Portuguese (Brazilian)

VoiceDescription
pedroNatural Brazilian Portuguese male voice.
anaNatural Brazilian Portuguese female voice.

OpenAI Compatible

VoiceDescription
shimmerSoft, calming — meditation and gentle narration.

Error Responses

The API returns standard HTTP error codes with a JSON body describing the issue. Common errors for this endpoint include:

  • 400 Bad Request— Missing or invalid parameters (e.g., input exceeds 4096 characters, unsupported voice, or speed outside the allowed range).
  • 401 Unauthorized— Missing or invalid API key.
  • 429 Too Many Requests — Rate limit exceeded. See Rate Limits.
  • 500 Internal Server Error— An unexpected error occurred on the server.

For the full list of error codes and troubleshooting guidance, see the Error Codes reference.

$1. 50 hours. Both STT and TTS.

Your current speech API provider is charging you too much. Switch in one line of code.

SPEAKY