Rate Limits
SpeakEasy enforces concurrency limits to protect service stability, ensure fair usage across all customers, and maintain consistent low-latency responses. Every account is limited to a maximum number of requests that can be processed at the same time.
Concurrency Limit
Each account can have up to 10 concurrent requests in flight at any time. This applies equally to all plans. If an 11th request arrives while 10 are still processing, it will be rejected with a 429 status code.
| Limit | Value |
|---|---|
| Max concurrent requests per account | 10 |
Need higher limits? Contact hello@sian-agency.online for details.
Response Headers
Every API response includes headers that report your current concurrency status. Use these to monitor usage and manage parallel requests.
| Header | Description |
|---|---|
X-RateLimit-Limit | Maximum number of concurrent requests allowed (currently 10). |
X-RateLimit-Remaining | Number of additional concurrent requests you can make right now. |
Retry-After | Included on 429 responses. Suggested wait time in seconds before retrying. |
Exceeding the Limit
When you exceed the concurrency limit, the API returns a 429 Too Many Requestsstatus code immediately. The request is not queued — your client should retry after a short wait.
HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 1
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0
{
"error": {
"type": "rate_limit_error",
"message": "Too many concurrent requests. Max 10 allowed. Please retry."
}
}Best Practices
Follow these guidelines to make the most of the concurrency limit and build resilient integrations:
- Await previous requests— The simplest approach is to process files sequentially. With 10–15 second processing times, sequential requests are often fast enough.
- Use a client-side semaphore— If you need parallelism, limit your client to 10 concurrent requests using a semaphore or connection pool. This avoids 429 errors entirely.
- Implement retry with backoff — When you receive a
429, wait theRetry-Afterduration (typically 1 second) then retry. Add jitter to avoid thundering herds. - Cache results when possible— If you are transcribing or synthesizing the same content repeatedly, store the result and serve it from cache instead of making a new API call.
- Use callback URLs for long audio files— For large audio transcriptions, supply a
callback_urlparameter. The API will process the file asynchronously and POST the result to your endpoint, freeing up a concurrency slot sooner.
Retry Logic Example
The following Python snippet demonstrates retry with backoff when the API returns a 429 status code:
import time
import requests
API_KEY = "sk-your-api-key"
URL = "https://api.tryspeakeasy.io/v1/audio/transcriptions"
MAX_RETRIES = 5
def transcribe_with_retry(file_path: str) -> dict:
"""Transcribe an audio file with retry on concurrency limits."""
for attempt in range(MAX_RETRIES):
with open(file_path, "rb") as f:
response = requests.post(
URL,
headers={"Authorization": f"Bearer {API_KEY}"},
files={"file": f},
data={"model": "whisper-large-v3", "response_format": "json"},
)
if response.status_code == 200:
return response.json()
if response.status_code == 429:
# Prefer the server-provided wait time, fall back to exponential backoff
retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
print(f"Concurrency limit hit. Retrying in {retry_after}s (attempt {attempt + 1}/{MAX_RETRIES})")
time.sleep(retry_after)
else:
response.raise_for_status()
raise Exception("Max retries exceeded. Please try again later.")The key points in this example:
- The
Retry-Afterheader is respected when available, giving you the shortest possible wait. - If the header is missing, the code falls back to exponential backoff (
2 ** attemptseconds). - A maximum retry count prevents infinite loops in the event of persistent concurrency limits.
$1. 50 hours. Both STT and TTS.
Your current speech API provider is charging you too much. Switch in one line of code.