·SpeakEasy Team

Speech-to-Text API in JavaScript: Complete Guide

Complete guide to using a speech-to-text API in JavaScript and Node.js. Learn file uploads, URL-based transcription, and streaming with the SpeakEasy API.

JavaScriptSpeech-to-TextTutorial

Speech-to-Text API in JavaScript: Complete Guide

Building a speech to text API integration in JavaScript? This guide covers everything from basic file transcription to streaming, using both the native fetch API and the OpenAI Node SDK with SpeakEasy.

Prerequisites

  • Node.js 18+ (for native fetch support) or any modern browser
  • A SpeakEasy API key (get one here)

Option 1: Using the OpenAI Node SDK

The simplest path. SpeakEasy is a drop-in replacement for the OpenAI API, so the official SDK works out of the box.

npm install openai
import OpenAI from "openai";
import fs from "fs";

const client = new OpenAI({
  apiKey: "YOUR_API_KEY",
  baseURL: "https://api.tryspeakeasy.io/v1",
});

const transcript = await client.audio.transcriptions.create({
  model: "whisper-large-v3",
  file: fs.createReadStream("audio.mp3"),
});

console.log(transcript.text);

That's it. Three lines of meaningful code and you have a working transcription.

Option 2: Using Fetch

If you prefer not to install a dependency, use the native fetch API directly.

const formData = new FormData();
formData.append("model", "whisper-large-v3");
formData.append("file", new Blob([await fs.promises.readFile("audio.mp3")]));

const response = await fetch(
  "https://api.tryspeakeasy.io/v1/audio/transcriptions",
  {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
    },
    body: formData,
  }
);

const result = await response.json();
console.log(result.text);

Getting Timestamps

For subtitle generation or audio search, request verbose JSON output with timestamps:

const transcript = await client.audio.transcriptions.create({
  model: "whisper-large-v3",
  file: fs.createReadStream("interview.mp3"),
  response_format: "verbose_json",
  timestamp_granularities: ["segment"],
});

for (const segment of transcript.segments) {
  console.log(`[${segment.start}s] ${segment.text}`);
}

URL-Based Transcription

Already have audio hosted somewhere? Skip the upload and pass a URL instead:

const response = await fetch(
  "https://api.tryspeakeasy.io/v1/audio/transcriptions",
  {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "whisper-large-v3",
      url: "https://example.com/recording.mp3",
    }),
  }
);

const result = await response.json();
console.log(result.text);

This is particularly useful when transcribing files stored in S3, GCS, or any public URL.

Streaming Transcription

For real-time use cases like live captioning, SpeakEasy supports streaming responses. Results are returned as server-sent events:

const response = await fetch(
  "https://api.tryspeakeasy.io/v1/audio/transcriptions",
  {
    method: "POST",
    headers: {
      Authorization: "Bearer YOUR_API_KEY",
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      model: "whisper-large-v3",
      url: "https://example.com/recording.mp3",
      stream: true,
    }),
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  process.stdout.write(decoder.decode(value));
}

Error Handling

Always handle API errors gracefully in production:

try {
  const transcript = await client.audio.transcriptions.create({
    model: "whisper-large-v3",
    file: fs.createReadStream("audio.mp3"),
  });
  console.log(transcript.text);
} catch (error) {
  if (error.status === 401) {
    console.error("Invalid API key. Check your credentials.");
  } else if (error.status === 413) {
    console.error("File too large. Maximum size is 25 MB.");
  } else {
    console.error("Transcription failed:", error.message);
  }
}

What's Next?

You're now equipped to integrate speech-to-text into any JavaScript application, whether it's a Node.js backend, a serverless function, or a browser-based tool. Explore speaker diarization to identify individual speakers, or check out text-to-speech to generate audio from your content.

Sign up for SpeakEasy and start building with a generous free tier today.

SPEAKY