Text-to-Speech API
The Text-to-Speech (TTS) API converts written text into lifelike spoken audio. Choose from 54 voices across 9 languages, control speed and output format, and optionally stream the audio in real time.
Endpoint
POST https://www.tryspeakeasy.io/api/v1/audio/speechAuthentication
All requests must include a valid API key in the Authorization header using the Bearer scheme:
Authorization: Bearer YOUR_API_KEYSee the Authentication guide for details on creating and managing API keys.
Request Parameters
Send a JSON body with Content-Type: application/json. The following parameters are supported:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| input | string | Yes | — | The text to generate audio for. Maximum length is 4096 characters. |
| voice | string | No | "heart" | The voice to use. 54 voices available across 9 languages. See Available Voices for the full list. |
| response_format | string | No | "mp3" | The audio format of the output. Supported formats: mp3, opus, aac, flac, pcm, ogg, wav. |
| language | string | No | auto | Language of the input text. Auto-derived from the chosen voice if omitted. Must match the voice's language. Valid codes: en-us, en-gb, ja, zh, es, fr, hi, it, pt-br. See Available Voices for supported languages per voice. |
| speed | number | No | 1.0 | The speed of the generated audio. Accepted range: 0.5 to 4.0. |
| stream | boolean | No | false | When true, audio is returned using chunked transfer encoding as it is generated. See Streaming Response below. |
| word_timestamps | boolean | No | false | When true, the response includes a JSON header with per-word timing information before the audio bytes. |
Code Examples
curl
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
-H "Authorization: Bearer $SPEAKEASY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "Hello, welcome to SpeakEasy!",
"voice": "heart",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3Python (OpenAI SDK)
SpeakEasy is fully compatible with the OpenAI Python SDK. Just set the base_url to https://www.tryspeakeasy.io/api/v1:
from openai import OpenAI
client = OpenAI(
api_key="sk-se-your-api-key",
base_url="https://www.tryspeakeasy.io/api/v1",
)
response = client.audio.speech.create(
model="tts-1",
voice="heart",
input="Hello, welcome to SpeakEasy!",
response_format="mp3",
speed=1.0,
)
response.stream_to_file("speech.mp3")JavaScript
const response = await fetch("https://www.tryspeakeasy.io/api/v1/audio/speech", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
input: "Hello, welcome to SpeakEasy!",
voice: "heart",
response_format: "mp3",
speed: 1.0,
}),
});
const audioBlob = await response.blob();
// Save to file (Node.js)
const fs = await import("node:fs");
const buffer = Buffer.from(await audioBlob.arrayBuffer());
fs.writeFileSync("speech.mp3", buffer);Response
On success the API returns the raw binary audio data with a Content-Type header matching the requested format:
| Format | Content-Type |
|---|---|
| mp3 | audio/mpeg |
| opus | audio/opus |
| aac | audio/aac |
| flac | audio/flac |
| pcm | audio/pcm |
| ogg | audio/ogg |
| wav | audio/wav |
The response also includes a Content-Length header (except when streaming) so you can display a progress bar or pre-allocate a buffer.
Streaming Response
When stream is set to true, the API returns audio using HTTP chunked transfer encoding. Audio chunks are sent as they are generated, allowing your application to begin playback before the entire response is ready. This is especially useful for long inputs or real-time applications.
The response uses Transfer-Encoding: chunked and the same Content-Type header as a non-streaming response. No Content-Length header is included.
Streaming with curl
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/speech \
-H "Authorization: Bearer $SPEAKEASY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"input": "This audio is streamed as it is generated.",
"voice": "nova",
"stream": true
}' \
--output speech.mp3Streaming with Python
from openai import OpenAI
client = OpenAI(
api_key="sk-se-your-api-key",
base_url="https://www.tryspeakeasy.io/api/v1",
)
response = client.audio.speech.create(
model="tts-1",
voice="nova",
input="This audio is streamed as it is generated.",
)
# Stream directly to a file
response.stream_to_file("speech.mp3")Streaming with JavaScript
const response = await fetch("https://www.tryspeakeasy.io/api/v1/audio/speech", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
input: "This audio is streamed as it is generated.",
voice: "nova",
stream: true,
}),
});
// Process chunks as they arrive
const reader = response.body.getReader();
const chunks = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
const audioBlob = new Blob(chunks, { type: "audio/mpeg" });Available Voices
SpeakEasy offers 54 voices across 9 languages. Set language to the matching code from the table below, or omit it to let SpeakEasy auto-derive the language from the chosen voice.
English (American) en-us
| Voice | Gender |
|---|---|
| heart | female |
| bella | female |
| michael | male |
| alloy | female |
| aoede | female |
| kore | female |
| jessica | female |
| nicole | female |
| nova | female |
| river | female |
| sarah | female |
| sky | female |
| echo | male |
| eric | male |
| fenrir | male |
| liam | male |
| onyx | male |
| puck | male |
| adam | male |
| santa | male |
English (British) en-gb
| Voice | Gender |
|---|---|
| alice | female |
| emma | female |
| isabella | female |
| lily | female |
| daniel | male |
| fable | male |
| george | male |
| lewis | male |
Japanese (beta) ja
| Voice | Gender |
|---|---|
| sakura | female |
| gongitsune | female |
| nezumi | female |
| tebukuro | female |
| kumo | male |
Mandarin Chinese (beta) zh
| Voice | Gender |
|---|---|
| xiaobei | female |
| xiaoni | female |
| xiaoxiao | female |
| xiaoyi | female |
| yunjian | male |
| yunxi | male |
| yunxia | male |
| yunyang | male |
Spanish (beta) es
| Voice | Gender |
|---|---|
| dora | female |
| alex | male |
| noel | male |
French (beta) fr
| Voice | Gender |
|---|---|
| siwis | female |
Hindi (beta) hi
| Voice | Gender |
|---|---|
| alpha | female |
| beta | female |
| omega | male |
| psi | male |
Italian (beta) it
| Voice | Gender |
|---|---|
| sara | female |
| nicola | male |
Portuguese (Brazil) (beta) pt-br
| Voice | Gender |
|---|---|
| clara | female |
| tiago | male |
| papai | male |
Error Responses
The API returns standard HTTP error codes with a JSON body describing the issue. Common errors for this endpoint include:
- 400 Bad Request— Missing or invalid parameters (e.g.,
inputexceeds 4096 characters, unsupportedvoice, orspeedoutside the allowed range). - 401 Unauthorized— Missing or invalid API key.
- 429 Too Many Requests — Rate limit exceeded. See Rate Limits.
- 500 Internal Server Error— An unexpected error occurred on the server.
For the full list of error codes and troubleshooting guidance, see the Error Codes reference.
$1. 50 hours. Both STT and TTS.
Your current speech API provider is charging you too much. Switch in one line of code.