Text-to-Speech API

The Text-to-Speech (TTS) API converts written text into lifelike spoken audio. Choose from 54 voices across 9 languages, control speed and output format, and optionally stream the audio in real time.

Endpoint

POST https://www.tryspeakeasy.io/api/v1/audio/speech

Authentication

All requests must include a valid API key in the Authorization header using the Bearer scheme:

Authorization: Bearer YOUR_API_KEY

See the Authentication guide for details on creating and managing API keys.

Request Parameters

Send a JSON body with Content-Type: application/json. The following parameters are supported:

Parameter	Type	Required	Default	Description
input	string	Yes	—	The text to generate audio for. Maximum length is 4096 characters.
voice	string	No	"heart"	The voice to use. 54 voices available across 9 languages. See Available Voices for the full list.
response_format	string	No	"mp3"	The audio format of the output. Supported formats: `mp3`, `opus`, `aac`, `flac`, `pcm`, `ogg`, `wav`.
language	string	No	auto	Language of the input text. Auto-derived from the chosen voice if omitted. Must match the voice's language. Valid codes: `en-us`, `en-gb`, `ja`, `zh`, `es`, `fr`, `hi`, `it`, `pt-br`. See Available Voices for supported languages per voice.
speed	number	No	1.0	The speed of the generated audio. Accepted range: `0.5` to `4.0`.
stream	boolean	No	false	When `true`, audio is returned using chunked transfer encoding as it is generated. See Streaming Response below.
word_timestamps	boolean	No	false	When `true`, the response includes a JSON header with per-word timing information before the audio bytes.

Code Examples

curl

curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
  -H "Authorization: Bearer $SPEAKEASY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, welcome to SpeakEasy!",
    "voice": "heart",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Python (OpenAI SDK)

SpeakEasy is fully compatible with the OpenAI Python SDK. Just set the base_url to https://www.tryspeakeasy.io/api/v1:

from openai import OpenAI

client = OpenAI(
    api_key="sk-se-your-api-key",
    base_url="https://www.tryspeakeasy.io/api/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="heart",
    input="Hello, welcome to SpeakEasy!",
    response_format="mp3",
    speed=1.0,
)

response.stream_to_file("speech.mp3")

JavaScript

const response = await fetch("https://www.tryspeakeasy.io/api/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    input: "Hello, welcome to SpeakEasy!",
    voice: "heart",
    response_format: "mp3",
    speed: 1.0,
  }),
});

const audioBlob = await response.blob();

// Save to file (Node.js)
const fs = await import("node:fs");
const buffer = Buffer.from(await audioBlob.arrayBuffer());
fs.writeFileSync("speech.mp3", buffer);

Response

On success the API returns the raw binary audio data with a Content-Type header matching the requested format:

Format	Content-Type
mp3	audio/mpeg
opus	audio/opus
aac	audio/aac
flac	audio/flac
pcm	audio/pcm
ogg	audio/ogg
wav	audio/wav

The response also includes a Content-Length header (except when streaming) so you can display a progress bar or pre-allocate a buffer.

Streaming Response

When stream is set to true, the API returns audio using HTTP chunked transfer encoding. Audio chunks are sent as they are generated, allowing your application to begin playback before the entire response is ready. This is especially useful for long inputs or real-time applications.

The response uses Transfer-Encoding: chunked and the same Content-Type header as a non-streaming response. No Content-Length header is included.

Streaming with curl

curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
  -H "Authorization: Bearer $SPEAKEASY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "This audio is streamed as it is generated.",
    "voice": "nova",
    "stream": true
  }' \
  --output speech.mp3

Streaming with Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-se-your-api-key",
    base_url="https://www.tryspeakeasy.io/api/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="This audio is streamed as it is generated.",
)

# Stream directly to a file
response.stream_to_file("speech.mp3")

Streaming with JavaScript

const response = await fetch("https://www.tryspeakeasy.io/api/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    input: "This audio is streamed as it is generated.",
    voice: "nova",
    stream: true,
  }),
});

// Process chunks as they arrive
const reader = response.body.getReader();
const chunks = [];

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  chunks.push(value);
}

const audioBlob = new Blob(chunks, { type: "audio/mpeg" });

Available Voices

SpeakEasy offers 54 voices across 9 languages. Set language to the matching code from the table below, or omit it to let SpeakEasy auto-derive the language from the chosen voice.

English (American) `en-us`

Voice	Gender
heart	female
bella	female
michael	male
alloy	female
aoede	female
kore	female
jessica	female
nicole	female
nova	female
river	female
sarah	female
sky	female
echo	male
eric	male
fenrir	male
liam	male
onyx	male
puck	male
adam	male
santa	male

English (British) `en-gb`

Voice	Gender
alice	female
emma	female
isabella	female
lily	female
daniel	male
fable	male
george	male
lewis	male

Japanese (beta) `ja`

Voice	Gender
sakura	female
gongitsune	female
nezumi	female
tebukuro	female
kumo	male

Mandarin Chinese (beta) `zh`

Voice	Gender
xiaobei	female
xiaoni	female
xiaoxiao	female
xiaoyi	female
yunjian	male
yunxi	male
yunxia	male
yunyang	male

Spanish (beta) `es`

Voice	Gender
dora	female
alex	male
noel	male

French (beta) `fr`

Voice	Gender
siwis	female

Hindi (beta) `hi`

Voice	Gender
alpha	female
beta	female
omega	male
psi	male

Italian (beta) `it`

Voice	Gender
sara	female
nicola	male

Portuguese (Brazil) (beta) `pt-br`

Voice	Gender
clara	female
tiago	male
papai	male

Error Responses

The API returns standard HTTP error codes with a JSON body describing the issue. Common errors for this endpoint include:

400 Bad Request— Missing or invalid parameters (e.g., input exceeds 4096 characters, unsupported voice, or speed outside the allowed range).
401 Unauthorized— Missing or invalid API key.
429 Too Many Requests — Rate limit exceeded. See Rate Limits.
500 Internal Server Error— An unexpected error occurred on the server.

For the full list of error codes and troubleshooting guidance, see the Error Codes reference.