·SpeakEasy

Generate SRT and VTT Subtitle Files Directly from Audio

SpeakEasy is the only affordable speech API that returns SRT and VTT subtitle files natively. No post-processing, no custom formatter — one API call.

speech-to-textsubtitlessrtvtttutorial

Adding subtitles to video used to mean: transcribe → build an SRT formatter → handle edge cases → debug timing offsets. SpeakEasy cuts that to a single API call. Pass response_format=srt or response_format=vtt and you get a ready-to-use subtitle file back.

What SRT and VTT look like

SRT (SubRip Subtitle) is the most widely supported subtitle format — used by YouTube, VLC, Premiere Pro, DaVinci Resolve, and virtually every other video tool:

1
00:00:00,000 --> 00:00:03,240
Welcome to this week's product update.

2
00:00:03,500 --> 00:00:07,120
We're shipping three new features today.

3
00:00:07,400 --> 00:00:11,800
First, async transcription with callback URLs.

VTT (WebVTT) is the web standard — used with the HTML5 <track> element for in-browser subtitles:

WEBVTT

00:00:00.000 --> 00:00:03.240
Welcome to this week's product update.

00:00:03.500 --> 00:00:07.120
We're shipping three new features today.

Generating an SRT file

curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@video.mp4" \
  -F "response_format=srt" \
  > subtitles.srt

That's it. The response body is the raw SRT content — pipe it straight to a file.

Generating a VTT file

curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@video.mp4" \
  -F "response_format=vtt" \
  > subtitles.vtt

Using subtitles in a web video player

<video controls>
  <source src="/videos/demo.mp4" type="video/mp4" />
  <track
    src="/subtitles/demo.vtt"
    kind="subtitles"
    srclang="en"
    label="English"
    default
  />
</video>

Generate the VTT, save it alongside your video, reference it in the <track> tag. No JavaScript needed.

Python: subtitle generation pipeline

import httpx
from pathlib import Path

def generate_subtitles(
    video_path: str,
    output_format: str = "srt",
    language: str | None = None,
) -> str:
    """Transcribe video and return SRT or VTT subtitle content."""
    params = {"response_format": output_format}
    if language:
        params["language"] = language

    with open(video_path, "rb") as f:
        response = httpx.post(
            "https://api.tryspeakeasy.io/v1/audio/transcriptions",
            headers={"Authorization": "Bearer YOUR_API_KEY"},
            data=params,
            files={"file": f},
            timeout=300,
        )
    response.raise_for_status()
    return response.text

# Generate SRT
srt_content = generate_subtitles("lecture.mp4", output_format="srt")
Path("lecture.srt").write_text(srt_content)

# Generate VTT for web player
vtt_content = generate_subtitles("lecture.mp4", output_format="vtt")
Path("lecture.vtt").write_text(vtt_content)

Multilingual subtitles

Generate subtitles in the source language, or use translate=true to get English subtitles from any audio:

# Subtitles in the original language (auto-detected)
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@french-lecture.mp4" \
  -F "response_format=srt"

# English subtitles from any language
curl -X POST https://api.tryspeakeasy.io/v1/audio/transcriptions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -F "file=@french-lecture.mp4" \
  -F "response_format=srt" \
  -F "translate=true"

Batch subtitle generation for a folder of videos

import httpx
import asyncio
from pathlib import Path

API_KEY = "YOUR_API_KEY"

async def subtitle_file(client: httpx.AsyncClient, video_path: Path) -> None:
    srt_path = video_path.with_suffix(".srt")
    if srt_path.exists():
        print(f"Skipping {video_path.name} (already done)")
        return

    with open(video_path, "rb") as f:
        response = await client.post(
            "https://api.tryspeakeasy.io/v1/audio/transcriptions",
            headers={"Authorization": f"Bearer {API_KEY}"},
            data={"response_format": "srt"},
            files={"file": f},
            timeout=600,
        )
    response.raise_for_status()
    srt_path.write_text(response.text)
    print(f"Done: {srt_path}")

async def main():
    videos = list(Path("./videos").glob("*.mp4"))
    async with httpx.AsyncClient() as client:
        # Process up to 5 at a time to respect concurrency limits
        semaphore = asyncio.Semaphore(5)
        async def bounded(path):
            async with semaphore:
                await subtitle_file(client, path)
        await asyncio.gather(*[bounded(v) for v in videos])

asyncio.run(main())

Available response formats

| Format | Use case | |--------|----------| | json | Default. Structured JSON with text, segments, timestamps | | verbose_json | JSON with word-level timestamps and speaker data | | text | Plain text, no timestamps | | srt | SubRip subtitle file | | vtt | WebVTT for HTML5 video |

SRT and VTT are unique to SpeakEasy among affordable speech APIs. Lemonfox, for example, returns JSON only — you'd need to write your own SRT formatter and handle edge cases like overlapping segments and line-length limits.


Start generating subtitles today — $1 for your first month, 50 hours included.

SPEAKY