Async Transcription API with Callback URLs
Learn how to transcribe long audio files asynchronously using a callback URL. Fire-and-forget transcription for podcasts, meetings, and large video files.
Short answer: Pass a callback_url on any SpeakEasy transcription request and the API accepts the file immediately, processes it in the background, and POSTs the finished transcript to your URL. No queue service to run, no polling loop to write — one parameter, one webhook.
When you're transcribing a 90-minute podcast or a 2-hour meeting recording, you don't want your server sitting idle waiting for the response. That's what the callback_url parameter is for: fire the request and get the result delivered to your endpoint when it's ready.
How async transcription works
Pass a callback_url in your transcription request. SpeakEasy will accept the file immediately (HTTP 200), process it in the background, and POST the completed transcript to your URL when done.
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@long-meeting.mp3" \
-F "callback_url=https://yourapp.com/webhooks/transcript"
The immediate response is a 200 with a job acknowledgement. Your webhook receives the full result later:
{
"text": "Welcome everyone, let's get started...",
"duration": 5432.1,
"language": "en",
"segments": [
{ "id": 0, "start": 0.0, "end": 4.2, "text": "Welcome everyone, let's get started." },
...
]
}
Why use async instead of sync?
Synchronous transcription holds an open HTTP connection for the full duration of processing. For long files this means:
- Connection timeouts on your load balancer or reverse proxy (most default to 60–120 seconds)
- Blocking a worker thread in your server for minutes at a time
- No retry path if the connection drops mid-way
Async solves all three. Your server is free immediately, and the result arrives as a standard webhook POST.
URL-based transcription (no upload needed)
For files already hosted somewhere, pass a URL instead of uploading:
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "url=https://your-bucket.s3.amazonaws.com/recording.mp4" \
-F "callback_url=https://yourapp.com/webhooks/transcript"
SpeakEasy fetches the file directly. No need to download it first and re-upload. Supports files up to 1 GB via URL.
Implementing the webhook receiver
Here's a minimal Express handler for the callback:
import express from "express";
const app = express();
app.use(express.json());
app.post("/webhooks/transcript", (req, res) => {
const { text, duration, segments, language } = req.body;
// Store in your database, trigger downstream tasks, etc.
console.log(`Transcription complete: ${duration}s, language: ${language}`);
console.log(text.slice(0, 100));
// Acknowledge immediately — SpeakEasy won't retry on 2xx
res.sendStatus(200);
});
Return a 2xx within a few seconds or SpeakEasy will retry. A 5xx or timeout triggers a retry with exponential backoff.
Combining async with other parameters
Async works with every transcription parameter:
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@interview.mp3" \
-F "callback_url=https://yourapp.com/webhooks/transcript" \
-F "diarize=true" \
-F "response_format=srt" \
-F "language=en"
The webhook payload will include the speaker labels (when diarize=true) or the full SRT file content (when response_format=srt).
Python example with async/await
import httpx
import asyncio
async def submit_transcription(file_path: str, callback_url: str):
async with httpx.AsyncClient() as client:
with open(file_path, "rb") as f:
response = await client.post(
"https://www.tryspeakeasy.io/api/v1/audio/transcriptions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
data={"callback_url": callback_url},
files={"file": f},
)
response.raise_for_status()
return response.json()
# Fire and forget — returns immediately
result = asyncio.run(submit_transcription(
"long-recording.mp3",
"https://yourapp.com/webhooks/transcript"
))
print("Job submitted:", result)
When to use sync vs async
| Scenario | Use |
|---|---|
| Short clips (< 2 min) in a user-facing flow | Sync (simpler) |
| Long recordings (podcast, meeting, lecture) | Async + callback |
| Batch processing a folder of files | Async + callback |
| Background job queue | Async + callback |
| Serverless function with 30s timeout | Async + callback |
The rule of thumb: if your file is longer than a few minutes or your infrastructure has tight timeout limits, use async.
Related reading
- Python Speech-to-Text API: Transcribe Audio in 5 Lines of Code — full Python walkthrough, including the async
callback_urlpattern - How to Build an AI Voice Agent with a Speech-to-Text API — when async matters for real-time pipelines
- Generate SRT and VTT Subtitle Files from Audio — pair async with
response_format=srtfor long-form video
Get started with a $1 first month trial. 50 hours included.