Text-to-Speech API

The Text-to-Speech (TTS) API converts written text into lifelike spoken audio. Choose from 54 voices across 9 languages, control speed and output format, and optionally stream the audio in real time.

Endpoint

POST https://www.tryspeakeasy.io/api/v1/audio/speech

Authentication

All requests must include a valid API key in the Authorization header using the Bearer scheme:

Authorization: Bearer YOUR_API_KEY

See the Authentication guide for details on creating and managing API keys.

Request Parameters

Send a JSON body with Content-Type: application/json. The following parameters are supported:

ParameterTypeRequiredDefaultDescription
inputstringYesThe text to generate audio for. Maximum length is 4096 characters.
voicestringNo"heart"The voice to use. 54 voices available across 9 languages. See Available Voices for the full list.
response_formatstringNo"mp3"The audio format of the output. Supported formats: mp3, opus, aac, flac, pcm, ogg, wav.
languagestringNoautoLanguage of the input text. Auto-derived from the chosen voice if omitted. Must match the voice's language. Valid codes: en-us, en-gb, ja, zh, es, fr, hi, it, pt-br. See Available Voices for supported languages per voice.
speednumberNo1.0The speed of the generated audio. Accepted range: 0.5 to 4.0.
streambooleanNofalseWhen true, audio is returned using chunked transfer encoding as it is generated. See Streaming Response below.
word_timestampsbooleanNofalseWhen true, the response includes a JSON header with per-word timing information before the audio bytes.

Code Examples

curl

curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
  -H "Authorization: Bearer $SPEAKEASY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "Hello, welcome to SpeakEasy!",
    "voice": "heart",
    "response_format": "mp3",
    "speed": 1.0
  }' \
  --output speech.mp3

Python (OpenAI SDK)

SpeakEasy is fully compatible with the OpenAI Python SDK. Just set the base_url to https://www.tryspeakeasy.io/api/v1:

from openai import OpenAI

client = OpenAI(
    api_key="sk-se-your-api-key",
    base_url="https://www.tryspeakeasy.io/api/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="heart",
    input="Hello, welcome to SpeakEasy!",
    response_format="mp3",
    speed=1.0,
)

response.stream_to_file("speech.mp3")

JavaScript

const response = await fetch("https://www.tryspeakeasy.io/api/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    input: "Hello, welcome to SpeakEasy!",
    voice: "heart",
    response_format: "mp3",
    speed: 1.0,
  }),
});

const audioBlob = await response.blob();

// Save to file (Node.js)
const fs = await import("node:fs");
const buffer = Buffer.from(await audioBlob.arrayBuffer());
fs.writeFileSync("speech.mp3", buffer);

Response

On success the API returns the raw binary audio data with a Content-Type header matching the requested format:

FormatContent-Type
mp3audio/mpeg
opusaudio/opus
aacaudio/aac
flacaudio/flac
pcmaudio/pcm
oggaudio/ogg
wavaudio/wav

The response also includes a Content-Length header (except when streaming) so you can display a progress bar or pre-allocate a buffer.

Streaming Response

When stream is set to true, the API returns audio using HTTP chunked transfer encoding. Audio chunks are sent as they are generated, allowing your application to begin playback before the entire response is ready. This is especially useful for long inputs or real-time applications.

The response uses Transfer-Encoding: chunked and the same Content-Type header as a non-streaming response. No Content-Length header is included.

Streaming with curl

curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
  -H "Authorization: Bearer $SPEAKEASY_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "input": "This audio is streamed as it is generated.",
    "voice": "nova",
    "stream": true
  }' \
  --output speech.mp3

Streaming with Python

from openai import OpenAI

client = OpenAI(
    api_key="sk-se-your-api-key",
    base_url="https://www.tryspeakeasy.io/api/v1",
)

response = client.audio.speech.create(
    model="tts-1",
    voice="nova",
    input="This audio is streamed as it is generated.",
)

# Stream directly to a file
response.stream_to_file("speech.mp3")

Streaming with JavaScript

const response = await fetch("https://www.tryspeakeasy.io/api/v1/audio/speech", {
  method: "POST",
  headers: {
    Authorization: `Bearer ${apiKey}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    input: "This audio is streamed as it is generated.",
    voice: "nova",
    stream: true,
  }),
});

// Process chunks as they arrive
const reader = response.body.getReader();
const chunks = [];

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  chunks.push(value);
}

const audioBlob = new Blob(chunks, { type: "audio/mpeg" });

Available Voices

SpeakEasy offers 54 voices across 9 languages. Set language to the matching code from the table below, or omit it to let SpeakEasy auto-derive the language from the chosen voice.

English (American) en-us

VoiceGender
heartfemale
bellafemale
michaelmale
alloyfemale
aoedefemale
korefemale
jessicafemale
nicolefemale
novafemale
riverfemale
sarahfemale
skyfemale
echomale
ericmale
fenrirmale
liammale
onyxmale
puckmale
adammale
santamale

English (British) en-gb

VoiceGender
alicefemale
emmafemale
isabellafemale
lilyfemale
danielmale
fablemale
georgemale
lewismale

Japanese (beta) ja

VoiceGender
sakurafemale
gongitsunefemale
nezumifemale
tebukurofemale
kumomale

Mandarin Chinese (beta) zh

VoiceGender
xiaobeifemale
xiaonifemale
xiaoxiaofemale
xiaoyifemale
yunjianmale
yunximale
yunxiamale
yunyangmale

Spanish (beta) es

VoiceGender
dorafemale
alexmale
noelmale

French (beta) fr

VoiceGender
siwisfemale

Hindi (beta) hi

VoiceGender
alphafemale
betafemale
omegamale
psimale

Italian (beta) it

VoiceGender
sarafemale
nicolamale

Portuguese (Brazil) (beta) pt-br

VoiceGender
clarafemale
tiagomale
papaimale

Error Responses

The API returns standard HTTP error codes with a JSON body describing the issue. Common errors for this endpoint include:

  • 400 Bad Request— Missing or invalid parameters (e.g., input exceeds 4096 characters, unsupported voice, or speed outside the allowed range).
  • 401 Unauthorized— Missing or invalid API key.
  • 429 Too Many Requests — Rate limit exceeded. See Rate Limits.
  • 500 Internal Server Error— An unexpected error occurred on the server.

For the full list of error codes and troubleshooting guidance, see the Error Codes reference.

$1. 50 hours. Both STT and TTS.

Your current speech API provider is charging you too much. Switch in one line of code.