Generate SRT and VTT Subtitle Files Directly from Audio

Short answer: Set response_format=srt or response_format=vtt on SpeakEasy's /api/v1/audio/transcriptions endpoint and you get a ready-to-use subtitle file back — correctly numbered, correctly timestamped. No post-processing, no custom formatter, no timing math. YouTube, VLC, Premiere Pro, DaVinci Resolve all accept the output as-is.

Want to generate one without writing any code first? audiotranscribe.app/audio-to-srt is a free playground that runs the same call and gives you a one-click .srt download. Useful for sanity-checking timing on a real file before wiring the API into your pipeline.

Adding subtitles to video used to mean: transcribe → build an SRT formatter → handle edge cases → debug timing offsets. SpeakEasy cuts that to a single API call. Pass response_format=srt or response_format=vtt and you get a ready-to-use subtitle file back.

What SRT and VTT look like

SRT (SubRip Subtitle) is the most widely supported subtitle format — used by YouTube, VLC, Premiere Pro, DaVinci Resolve, and virtually every other video tool:

1
00:00:00,000 --> 00:00:03,240
Welcome to this week's product update.

2
00:00:03,500 --> 00:00:07,120
We're shipping three new features today.

3
00:00:07,400 --> 00:00:11,800
First, async transcription with callback URLs.

VTT (WebVTT) is the web standard — used with the HTML5 <track> element for in-browser subtitles:

WEBVTT

00:00:00.000 --> 00:00:03.240
Welcome to this week's product update.

00:00:03.500 --> 00:00:07.120
We're shipping three new features today.

Generating an SRT file

curl -X POST https://www.tryspeakeasy.io/api/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@video.mp4" \
  -F "response_format=srt" \
  > subtitles.srt

That's it. The response body is the raw SRT content — pipe it straight to a file.

Generating a VTT file

curl -X POST https://www.tryspeakeasy.io/api/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@video.mp4" \
  -F "response_format=vtt" \
  > subtitles.vtt

Using subtitles in a web video player

<video controls>
  <source src="/videos/demo.mp4" type="video/mp4" />
  <track
    src="/subtitles/demo.vtt"
    kind="subtitles"
    srclang="en"
    label="English"
    default
  />
</video>

Generate the VTT, save it alongside your video, reference it in the <track> tag. No JavaScript needed.

Python: subtitle generation pipeline

import httpx
from pathlib import Path

def generate_subtitles(
    video_path: str,
    output_format: str = "srt",
    language: str | None = None,
) -> str:
    """Transcribe video and return SRT or VTT subtitle content."""
    params = {"response_format": output_format}
    if language:
        params["language"] = language

    with open(video_path, "rb") as f:
        response = httpx.post(
            "https://www.tryspeakeasy.io/api/v1/audio/transcriptions",
            headers={"Authorization": "Bearer YOUR_API_KEY"},
            data=params,
            files={"file": f},
            timeout=300,
        )
    response.raise_for_status()
    return response.text

# Generate SRT
srt_content = generate_subtitles("lecture.mp4", output_format="srt")
Path("lecture.srt").write_text(srt_content)

# Generate VTT for web player
vtt_content = generate_subtitles("lecture.mp4", output_format="vtt")
Path("lecture.vtt").write_text(vtt_content)

Multilingual subtitles

Generate subtitles in the source language, or use translate=true to get English subtitles from any audio:

# Subtitles in the original language (auto-detected)
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@french-lecture.mp4" \
  -F "response_format=srt"

# English subtitles from any language
curl -X POST https://www.tryspeakeasy.io/api/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@french-lecture.mp4" \
  -F "response_format=srt" \
  -F "translate=true"

Batch subtitle generation for a folder of videos

import httpx
import asyncio
from pathlib import Path

API_KEY = "YOUR_API_KEY"

async def subtitle_file(client: httpx.AsyncClient, video_path: Path) -> None:
    srt_path = video_path.with_suffix(".srt")
    if srt_path.exists():
        print(f"Skipping {video_path.name} (already done)")
        return

    with open(video_path, "rb") as f:
        response = await client.post(
            "https://www.tryspeakeasy.io/api/v1/audio/transcriptions",
            headers={"Authorization": f"Bearer {API_KEY}"},
            data={"response_format": "srt"},
            files={"file": f},
            timeout=600,
        )
    response.raise_for_status()
    srt_path.write_text(response.text)
    print(f"Done: {srt_path}")

async def main():
    videos = list(Path("./videos").glob("*.mp4"))
    async with httpx.AsyncClient() as client:
        # Process up to 5 at a time to respect concurrency limits
        semaphore = asyncio.Semaphore(5)
        async def bounded(path):
            async with semaphore:
                await subtitle_file(client, path)
        await asyncio.gather(*[bounded(v) for v in videos])

asyncio.run(main())

Available response formats

Format	Use case
`json`	Default. Structured JSON with text, segments, timestamps
`verbose_json`	JSON with word-level timestamps and speaker data
`text`	Plain text, no timestamps
`srt`	SubRip subtitle file
`vtt`	WebVTT for HTML5 video

SRT and VTT are unique to SpeakEasy among affordable speech APIs. Lemonfox, for example, returns JSON only — you'd need to write your own SRT formatter and handle edge cases like overlapping segments and line-length limits.

Generate SRT and VTT Subtitle Files Directly from Audio

What SRT and VTT look like

Generating an SRT file

Generating a VTT file

Using subtitles in a web video player

Python: subtitle generation pipeline

Multilingual subtitles

Batch subtitle generation for a folder of videos

Available response formats

Related reading

Keep reading

Translate Audio to English in One API Call

How to Use the Prompt Parameter to Improve Whisper Transcription Accuracy

Async Transcription API with Callback URLs

$1. 50 hours. Both STT and TTS.