Rate Limits

SpeakEasy enforces concurrency limits to protect service stability, ensure fair usage across all customers, and maintain consistent low-latency responses. Every account is limited to a maximum number of requests that can be processed at the same time.

Concurrency Limit

Each account can have up to 10 concurrent requests in flight at any time. This applies equally to all plans. If an 11th request arrives while 10 are still processing, it will be rejected with a 429 status code.

Limit	Value
Max concurrent requests per account	10

Need higher limits? Contact hello@tryspeakeasy.io for details.

Response Headers

Every API response includes headers that report your current concurrency status. Use these to monitor usage and manage parallel requests.

Header	Description
`X-RateLimit-Limit`	Maximum number of concurrent requests allowed (currently 10).
`X-RateLimit-Remaining`	Number of additional concurrent requests you can make right now.
`Retry-After`	Included on `429` responses. Suggested wait time in seconds before retrying.

Exceeding the Limit

When you exceed the concurrency limit, the API returns a 429 Too Many Requestsstatus code immediately. The request is not queued — your client should retry after a short wait.

HTTP/1.1 429 Too Many Requests
Content-Type: application/json
Retry-After: 1
X-RateLimit-Limit: 10
X-RateLimit-Remaining: 0

{
  "error": {
    "type": "rate_limit_error",
    "message": "Too many concurrent requests. Max 10 allowed. Please retry."
  }
}

Best Practices

Follow these guidelines to make the most of the concurrency limit and build resilient integrations:

Await previous requests— The simplest approach is to process files sequentially. With 10–15 second processing times, sequential requests are often fast enough.
Use a client-side semaphore— If you need parallelism, limit your client to 10 concurrent requests using a semaphore or connection pool. This avoids 429 errors entirely.
Implement retry with backoff — When you receive a 429, wait the Retry-After duration (typically 1 second) then retry. Add jitter to avoid thundering herds.
Cache results when possible— If you are transcribing or synthesizing the same content repeatedly, store the result and serve it from cache instead of making a new API call.
Use callback URLs for long audio files— For large audio transcriptions, supply a callback_url parameter. The API will process the file asynchronously and POST the result to your endpoint, freeing up a concurrency slot sooner.

Retry Logic Example

The following Python snippet demonstrates retry with backoff when the API returns a 429 status code:

import time
import requests

API_KEY = "sk-se-your-api-key"
URL = "https://www.tryspeakeasy.io/api/v1/audio/transcriptions"
MAX_RETRIES = 5


def transcribe_with_retry(file_path: str) -> dict:
    """Transcribe an audio file with retry on concurrency limits."""
    for attempt in range(MAX_RETRIES):
        with open(file_path, "rb") as f:
            response = requests.post(
                URL,
                headers={"Authorization": f"Bearer {API_KEY}"},
                files={"file": f},
                data={"model": "whisper-large-v3", "response_format": "json"},
            )

        if response.status_code == 200:
            return response.json()

        if response.status_code == 429:
            # Prefer the server-provided wait time, fall back to exponential backoff
            retry_after = int(response.headers.get("Retry-After", 2 ** attempt))
            print(f"Concurrency limit hit. Retrying in {retry_after}s (attempt {attempt + 1}/{MAX_RETRIES})")
            time.sleep(retry_after)
        else:
            response.raise_for_status()

    raise Exception("Max retries exceeded. Please try again later.")

The key points in this example: