Generate SRT and VTT Subtitle Files Directly from Audio
SpeakEasy is the only affordable speech API that returns SRT and VTT subtitle files natively. No post-processing, no custom formatter — one API call.
Adding subtitles to video used to mean: transcribe → build an SRT formatter → handle edge cases → debug timing offsets. SpeakEasy cuts that to a single API call. Pass response_format=srt or response_format=vtt and you get a ready-to-use subtitle file back.
What SRT and VTT look like
SRT (SubRip Subtitle) is the most widely supported subtitle format — used by YouTube, VLC, Premiere Pro, DaVinci Resolve, and virtually every other video tool:
1
00:00:00,000 --> 00:00:03,240
Welcome to this week's product update.
2
00:00:03,500 --> 00:00:07,120
We're shipping three new features today.
3
00:00:07,400 --> 00:00:11,800
First, async transcription with callback URLs.
VTT (WebVTT) is the web standard — used with the HTML5 <track> element for in-browser subtitles:
WEBVTT
00:00:00.000 --> 00:00:03.240
Welcome to this week's product update.
00:00:03.500 --> 00:00:07.120
We're shipping three new features today.
Generating an SRT file
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@video.mp4" \
-F "response_format=srt" \
> subtitles.srt
That's it. The response body is the raw SRT content — pipe it straight to a file.
Generating a VTT file
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@video.mp4" \
-F "response_format=vtt" \
> subtitles.vtt
Using subtitles in a web video player
<video controls>
<source src="/videos/demo.mp4" type="video/mp4" />
<track
src="/subtitles/demo.vtt"
kind="subtitles"
srclang="en"
label="English"
default
/>
</video>
Generate the VTT, save it alongside your video, reference it in the <track> tag. No JavaScript needed.
Python: subtitle generation pipeline
import httpx
from pathlib import Path
def generate_subtitles(
video_path: str,
output_format: str = "srt",
language: str | None = None,
) -> str:
"""Transcribe video and return SRT or VTT subtitle content."""
params = {"response_format": output_format}
if language:
params["language"] = language
with open(video_path, "rb") as f:
response = httpx.post(
"https://api.tryspeakeasy.io/v1/audio/transcriptions",
headers={"Authorization": "Bearer YOUR_API_KEY"},
data=params,
files={"file": f},
timeout=300,
)
response.raise_for_status()
return response.text
# Generate SRT
srt_content = generate_subtitles("lecture.mp4", output_format="srt")
Path("lecture.srt").write_text(srt_content)
# Generate VTT for web player
vtt_content = generate_subtitles("lecture.mp4", output_format="vtt")
Path("lecture.vtt").write_text(vtt_content)
Multilingual subtitles
Generate subtitles in the source language, or use translate=true to get English subtitles from any audio:
# Subtitles in the original language (auto-detected)
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@french-lecture.mp4" \
-F "response_format=srt"
# English subtitles from any language
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "file=@french-lecture.mp4" \
-F "response_format=srt" \
-F "translate=true"
Batch subtitle generation for a folder of videos
import httpx
import asyncio
from pathlib import Path
API_KEY = "YOUR_API_KEY"
async def subtitle_file(client: httpx.AsyncClient, video_path: Path) -> None:
srt_path = video_path.with_suffix(".srt")
if srt_path.exists():
print(f"Skipping {video_path.name} (already done)")
return
with open(video_path, "rb") as f:
response = await client.post(
"https://api.tryspeakeasy.io/v1/audio/transcriptions",
headers={"Authorization": f"Bearer {API_KEY}"},
data={"response_format": "srt"},
files={"file": f},
timeout=600,
)
response.raise_for_status()
srt_path.write_text(response.text)
print(f"Done: {srt_path}")
async def main():
videos = list(Path("./videos").glob("*.mp4"))
async with httpx.AsyncClient() as client:
# Process up to 5 at a time to respect concurrency limits
semaphore = asyncio.Semaphore(5)
async def bounded(path):
async with semaphore:
await subtitle_file(client, path)
await asyncio.gather(*[bounded(v) for v in videos])
asyncio.run(main())
Available response formats
| Format | Use case |
|--------|----------|
| json | Default. Structured JSON with text, segments, timestamps |
| verbose_json | JSON with word-level timestamps and speaker data |
| text | Plain text, no timestamps |
| srt | SubRip subtitle file |
| vtt | WebVTT for HTML5 video |
SRT and VTT are unique to SpeakEasy among affordable speech APIs. Lemonfox, for example, returns JSON only — you'd need to write your own SRT formatter and handle edge cases like overlapping segments and line-length limits.
Start generating subtitles today — $1 for your first month, 50 hours included.