Text-to-Speech API
The Text-to-Speech (TTS) API converts written text into lifelike spoken audio. Choose from 50+ voices across 9 languages, control speed and output format, and optionally stream the audio in real time.
Endpoint
POST https://api.tryspeakeasy.io/v1/audio/speechAuthentication
All requests must include a valid API key in the Authorization header using the Bearer scheme:
Authorization: Bearer YOUR_API_KEYSee the Authentication guide for details on creating and managing API keys.
Request Parameters
Send a JSON body with Content-Type: application/json. The following parameters are supported:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
| model | string | Yes | — | The TTS model to use. Options: tts-1 (optimised for speed) or tts-1-hd (optimised for quality). |
| input | string | Yes | — | The text to generate audio for. Maximum length is 4096 characters. |
| voice | string | Yes | — | The voice to use. 50+ voices available across 9 languages. Default: alloy. See Available Voices for the full list. |
| response_format | string | No | "mp3" | The audio format of the output. Supported formats: mp3, opus, aac, flac, pcm, ogg, wav. |
| language | string | No | — | Hint the language of the input text. The voice must support the specified language. Valid codes: en-us, en-gb, ja, zh, es, fr, hi, it, pt-br. See Available Voices for supported languages per voice. |
| speed | number | No | 1.0 | The speed of the generated audio. Accepted range: 0.5 to 4.0. |
| stream | boolean | No | false | When true, audio is returned using chunked transfer encoding as it is generated. See Streaming Response below. |
| word_timestamps | boolean | No | false | When true, the response includes a JSON header with per-word timing information before the audio bytes. |
Code Examples
curl
curl -X POST https://api.tryspeakeasy.io/v1/audio/speech \
-H "Authorization: Bearer $SPEAKEASY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "Hello, welcome to SpeakEasy!",
"voice": "alloy",
"response_format": "mp3",
"speed": 1.0
}' \
--output speech.mp3Python (OpenAI SDK)
SpeakEasy is fully compatible with the OpenAI Python SDK. Just set the base_url to https://api.tryspeakeasy.io/v1:
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.tryspeakeasy.io/v1",
)
response = client.audio.speech.create(
model="tts-1",
voice="alloy",
input="Hello, welcome to SpeakEasy!",
response_format="mp3",
speed=1.0,
)
response.stream_to_file("speech.mp3")JavaScript
const response = await fetch("https://api.tryspeakeasy.io/v1/audio/speech", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "tts-1",
input: "Hello, welcome to SpeakEasy!",
voice: "alloy",
response_format: "mp3",
speed: 1.0,
}),
});
const audioBlob = await response.blob();
// Save to file (Node.js)
const fs = await import("node:fs");
const buffer = Buffer.from(await audioBlob.arrayBuffer());
fs.writeFileSync("speech.mp3", buffer);Response
On success the API returns the raw binary audio data with a Content-Type header matching the requested format:
| Format | Content-Type |
|---|---|
| mp3 | audio/mpeg |
| opus | audio/opus |
| aac | audio/aac |
| flac | audio/flac |
| pcm | audio/pcm |
| ogg | audio/ogg |
| wav | audio/wav |
The response also includes a Content-Length header (except when streaming) so you can display a progress bar or pre-allocate a buffer.
Streaming Response
When stream is set to true, the API returns audio using HTTP chunked transfer encoding. Audio chunks are sent as they are generated, allowing your application to begin playback before the entire response is ready. This is especially useful for long inputs or real-time applications.
The response uses Transfer-Encoding: chunked and the same Content-Type header as a non-streaming response. No Content-Length header is included.
Streaming with curl
curl -X POST https://api.tryspeakeasy.io/v1/audio/speech \
-H "Authorization: Bearer $SPEAKEASY_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "This audio is streamed as it is generated.",
"voice": "nova",
"stream": true
}' \
--output speech.mp3Streaming with Python
from openai import OpenAI
client = OpenAI(
api_key="sk-your-api-key",
base_url="https://api.tryspeakeasy.io/v1",
)
response = client.audio.speech.create(
model="tts-1",
voice="nova",
input="This audio is streamed as it is generated.",
)
# Stream directly to a file
response.stream_to_file("speech.mp3")Streaming with JavaScript
const response = await fetch("https://api.tryspeakeasy.io/v1/audio/speech", {
method: "POST",
headers: {
Authorization: `Bearer ${apiKey}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
model: "tts-1",
input: "This audio is streamed as it is generated.",
voice: "nova",
stream: true,
}),
});
// Process chunks as they arrive
const reader = response.body.getReader();
const chunks = [];
while (true) {
const { done, value } = await reader.read();
if (done) break;
chunks.push(value);
}
const audioBlob = new Blob(chunks, { type: "audio/mpeg" });Available Voices
SpeakEasy offers 50+ voices across 9 languages. Each voice is available with both the tts-1 and tts-1-hd models.
English (American)
| Voice | Description |
|---|---|
| heart | Warm, expressive — the default voice. |
| bella | Friendly, approachable female voice. |
| michael | Clear, professional male voice. |
| alloy | Neutral, balanced — versatile for most content. |
| aoede | Melodic, engaging female voice. |
| kore | Youthful, energetic female voice. |
| jessica | Natural, conversational female voice. |
| nicole | Polished, articulate female voice. |
| nova | Bright, friendly — perfect for upbeat content. |
| river | Smooth, flowing narration voice. |
| sarah | Warm, trustworthy female voice. |
| sky | Light, airy female voice. |
| echo | Warm, conversational — ideal for dialogue. |
| eric | Strong, confident male voice. |
| fenrir | Deep, powerful male voice. |
| liam | Youthful, energetic male voice. |
| onyx | Deep, authoritative — professional and formal. |
| puck | Playful, dynamic male voice. |
| adam | Steady, reliable male voice. |
| santa | Jolly, festive character voice. |
English (British)
Set language to en-gb when using British English voices.
| Voice | Description |
|---|---|
| alice | Refined, classic British female voice. |
| emma | Modern, friendly British female voice. |
| isabella | Elegant, articulate British female voice. |
| lily | Bright, cheerful British female voice. |
| daniel | Composed, professional British male voice. |
| fable | Expressive, dynamic — great for storytelling. |
| george | Warm, distinguished British male voice. |
| lewis | Clear, authoritative British male voice. |
Japanese
| Voice | Description |
|---|---|
| haruto | Natural Japanese male voice. |
| yuki | Natural Japanese female voice. |
Chinese (Mandarin)
| Voice | Description |
|---|---|
| xiaobei | Clear Mandarin female voice. |
| yunjian | Natural Mandarin male voice. |
Spanish
| Voice | Description |
|---|---|
| carlos | Natural Spanish male voice. |
| maria | Natural Spanish female voice. |
French
| Voice | Description |
|---|---|
| pierre | Natural French male voice. |
| amelie | Natural French female voice. |
Hindi
| Voice | Description |
|---|---|
| arjun | Natural Hindi male voice. |
| priya | Natural Hindi female voice. |
Italian
| Voice | Description |
|---|---|
| luca | Natural Italian male voice. |
| giulia | Natural Italian female voice. |
Portuguese (Brazilian)
| Voice | Description |
|---|---|
| pedro | Natural Brazilian Portuguese male voice. |
| ana | Natural Brazilian Portuguese female voice. |
OpenAI Compatible
| Voice | Description |
|---|---|
| shimmer | Soft, calming — meditation and gentle narration. |
Error Responses
The API returns standard HTTP error codes with a JSON body describing the issue. Common errors for this endpoint include:
- 400 Bad Request— Missing or invalid parameters (e.g.,
inputexceeds 4096 characters, unsupportedvoice, orspeedoutside the allowed range). - 401 Unauthorized— Missing or invalid API key.
- 429 Too Many Requests — Rate limit exceeded. See Rate Limits.
- 500 Internal Server Error— An unexpected error occurred on the server.
For the full list of error codes and troubleshooting guidance, see the Error Codes reference.
$1. 50 hours. Both STT and TTS.
Your current speech API provider is charging you too much. Switch in one line of code.