Important: This documentation covers Yarn 1 (Classic).
For Yarn 2+ docs and migration guide, see yarnpkg.com.

Package detail

assemblyai

AssemblyAI188.1kMIT4.9.0TypeScript support: included

The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, as well as the latest LeMUR models.

AssemblyAI, Speech-to-text, Transcription, Audio, LLM

readme


npm Test GitHub License AssemblyAI Twitter AssemblyAI YouTube Discord

AssemblyAI JavaScript SDK

The AssemblyAI JavaScript SDK provides an easy-to-use interface for interacting with the AssemblyAI API, which supports async and real-time transcription, as well as the latest LeMUR models. It is written primarily for Node.js in TypeScript with all types exported, but also compatible with other runtimes.

Documentation

Visit the AssemblyAI documentation for step-by-step instructions and a lot more details about our AI models and API. Explore the SDK API reference for more details on the SDK types, functions, and classes.

Quickstart

Install the AssemblyAI SDK using your preferred package manager:

npm install assemblyai
yarn add assemblyai
pnpm add assemblyai
bun add assemblyai

Then, import the assemblyai module and create an AssemblyAI object with your API key:

import { AssemblyAI } from "assemblyai";

const client = new AssemblyAI({
  apiKey: process.env.ASSEMBLYAI_API_KEY,
});

You can now use the client object to interact with the AssemblyAI API.

Using a CDN

You can use automatic CDNs like UNPKG to load the library from a script tag.

  • Replace :version with the desired version or latest.
  • Remove .min to load the non-minified version.
  • Remove .streaming to load the entire SDK. Keep .streaming to load the Streaming STT specific version.
<!-- Unminified full SDK -->
<script src="https://www.unpkg.com/assemblyai@:version/dist/assemblyai.umd.js"></script>
<!-- Minified full SDK -->
<script src="https://www.unpkg.com/assemblyai@:version/dist/assemblyai.umd.min.js"></script>
<!-- Unminified Streaming STT only -->
<script src="https://www.unpkg.com/assemblyai@:version/dist/assemblyai.streaming.umd.js"></script>
<!-- Minified Streaming STT only -->
<script src="https://www.unpkg.com/assemblyai@:version/dist/assemblyai.streaming.umd.min.js"></script>

The script creates a global assemblyai variable containing all the services. Here's how you create a RealtimeTranscriber object.

const { RealtimeTranscriber } = assemblyai;
const transcriber = new RealtimeTranscriber({
  token: "[GENERATE TEMPORARY AUTH TOKEN IN YOUR API]",
  ...
});

For type support in your IDE, see Reference types from JavaScript.

Speech-To-Text

Transcribe audio and video files

<summary>Transcribe an audio file with a public URL</summary>

When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.

// Transcribe file at remote URL
let transcript = await client.transcripts.transcribe({
  audio: "https://assembly.ai/espn.m4a",
});

Note You can also pass a local file path, a stream, or a buffer as the audio property.

transcribe queues a transcription job and polls it until the status is completed or error.

If you don't want to wait until the transcript is ready, you can use submit:

let transcript = await client.transcripts.submit({
  audio: "https://assembly.ai/espn.m4a",
});
<summary>Transcribe a local audio file</summary>

When you create a transcript, you can either pass in a URL to an audio file or upload a file directly.

// Upload a file via local path and transcribe
let transcript = await client.transcripts.transcribe({
  audio: "./news.mp4",
});

Note: You can also pass a file URL, a stream, or a buffer as the audio property.

transcribe queues a transcription job and polls it until the status is completed or error.

If you don't want to wait until the transcript is ready, you can use submit:

let transcript = await client.transcripts.submit({
  audio: "./news.mp4",
});
<summary>Enable additional AI models</summary>

You can extract even more insights from the audio by enabling any of our AI models using transcription options. For example, here's how to enable Speaker diarization model to detect who said what.

let transcript = await client.transcripts.transcribe({
  audio: "https://assembly.ai/espn.m4a",
  speaker_labels: true,
});
for (let utterance of transcript.utterances) {
  console.log(`Speaker ${utterance.speaker}: ${utterance.text}`);
}
<summary>Get a transcript</summary>

This will return the transcript object in its current state. If the transcript is still processing, the status field will be queued or processing. Once the transcript is complete, the status field will be completed.

const transcript = await client.transcripts.get(transcript.id);

If you created a transcript using .submit(), you can still poll until the transcript status is completed or error using .waitUntilReady():

const transcript = await client.transcripts.waitUntilReady(transcript.id, {
  // How frequently the transcript is polled in ms. Defaults to 3000.
  pollingInterval: 1000,
  // How long to wait in ms until the "Polling timeout" error is thrown. Defaults to infinite (-1).
  pollingTimeout: 5000,
});
<summary>Get sentences and paragraphs</summary>
const sentences = await client.transcripts.sentences(transcript.id);
const paragraphs = await client.transcripts.paragraphs(transcript.id);
<summary>Get subtitles</summary>
const charsPerCaption = 32;
let srt = await client.transcripts.subtitles(transcript.id, "srt");
srt = await client.transcripts.subtitles(transcript.id, "srt", charsPerCaption);

let vtt = await client.transcripts.subtitles(transcript.id, "vtt");
vtt = await client.transcripts.subtitles(transcript.id, "vtt", charsPerCaption);
<summary>List transcripts</summary>

This will return a page of transcripts you created.

const page = await client.transcripts.list();

You can also paginate over all pages.

let previousPageUrl: string | null = null;
do {
  const page = await client.transcripts.list(previousPageUrl);
  previousPageUrl = page.page_details.prev_url;
} while (previousPageUrl !== null);

[!NOTE] To paginate over all pages, you need to use the page.page_details.prev_url because the transcripts are returned in descending order by creation date and time. The first page is are the most recent transcript, and each "previous" page are older transcripts.

<summary>Delete a transcript</summary>
const res = await client.transcripts.delete(transcript.id);

Transcribe in real-time

Create the real-time transcriber.

const rt = client.realtime.transcriber();

You can also pass in the following options.

const rt = client.realtime.transcriber({
  realtimeUrl: 'wss://localhost/override',
  apiKey: process.env.ASSEMBLYAI_API_KEY // The API key passed to `AssemblyAI` will be used by default,
  sampleRate: 16_000,
  wordBoost: ['foo', 'bar']
});

[!WARNING] Storing your API key in client-facing applications exposes your API key. Generate a temporary auth token on the server and pass it to your client. Server code:

const token = await client.realtime.createTemporaryToken({ expires_in = 60 });
// TODO: return token to client

Client code:

import { RealtimeTranscriber } from "assemblyai"; // or "assemblyai/streaming"
// TODO: implement getToken to retrieve token from server
const token = await getToken();
const rt = new RealtimeTranscriber({
  token,
});

You can configure the following events.

rt.on("open", ({ sessionId, expiresAt }) => console.log('Session ID:', sessionId, 'Expires at:', expiresAt));
rt.on("close", (code: number, reason: string) => console.log('Closed', code, reason));
rt.on("transcript", (transcript: TranscriptMessage) => console.log('Transcript:', transcript));
rt.on("transcript.partial", (transcript: PartialTranscriptMessage) => console.log('Partial transcript:', transcript));
rt.on("transcript.final", (transcript: FinalTranscriptMessage) => console.log('Final transcript:', transcript));
rt.on("error", (error: Error) => console.error('Error', error));

After configuring your events, connect to the server.

await rt.connect();

Send audio data via chunks.

// Pseudo code for getting audio
getAudio((chunk) => {
  rt.sendAudio(chunk);
});

Or send audio data via a stream by piping to the real-time stream.

audioStream.pipeTo(rt.stream());

Close the connection when you're finished.

await rt.close();

Apply LLMs to your audio with LeMUR

Call LeMUR endpoints to apply LLMs to your transcript.

<summary>Prompt your audio with LeMUR</summary>
const { response } = await client.lemur.task({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
  prompt: "Write a haiku about this conversation.",
});
<summary>Summarize with LeMUR</summary>
const { response } = await client.lemur.summary({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
  answer_format: "one sentence",
  context: {
    speakers: ["Alex", "Bob"],
  },
});
<summary>Ask questions</summary>
const { response } = await client.lemur.questionAnswer({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
  questions: [
    {
      question: "What are they discussing?",
      answer_format: "text",
    },
  ],
});
<summary>Generate action items</summary>
const { response } = await client.lemur.actionItems({
  transcript_ids: ["0d295578-8c75-421a-885a-2c487f188927"],
});
<summary>Delete LeMUR request</summary>
const response = await client.lemur.purgeRequestData(lemurResponse.request_id);

Contributing

If you want to contribute to the JavaScript SDK, follow the guidelines in CONTRIBUTING.md.

changelog

Changelog

[4.8.0]

  • Add multichannel property to TranscriptParams
  • Add multichannel and audio_channels property to Transcript
  • Add channel property to TranscriptWord, TranscriptUtterance, TranscriptSentence, and SentimentAnalysisResult

[4.7.1]

  • Log a warning when a user tries to use API key authentication in the browser to connect to the real-time Streaming STT API.
  • Update dependencies
  • Use assembly.ai short URL for sample files

[4.7.0]

  • Add language_confidence_threshold to Transcript, TranscriptParams, and TranscriptOptionalParams.

    The confidence threshold for the automatically detected language. An error will be returned if the langauge confidence is below this threshold.

  • Add language_confidence to Transcript

    The confidence score for the detected language, between 0.0 (low confidence) and 1.0 (high confidence)

Using these new fields you can determine the confidence of the language detection model (enable by setting language_detection to true), and fail the transcript if it doesn't meet your desired threshold.

Learn more about the new automatic language detection model and feature improvements on our blog.

[4.6.2]

  • Change RealtimeErrorType from enum to const object.
  • Add RealtimeErrorTypeCodes which is a union of RealtimeErrorType values

[4.6.1]

  • Remove conformer-2 from SpeechModel union type.
  • Remove conformer-2 deprecation warning

[4.6.0]

  • Add more TSDoc comments for RealtimeService documentation
  • Add new LeMUR models
  • Add TranscriptWebhookNotification which is a union of TranscriptReadyNotification or RedactedAudioNotification
  • Add RedactedAudioNotification which represents the body of the PII redacted audio webhook notification.

[4.5.0]

  • You can now retrieve previous LeMUR responses using client.lemur.getResponse<LemurTask>("YOUR_REQUEST_ID").
  • LeMUR functions now return usage with the number of input_tokens and output_tokens.

[4.4.7]

  • Rename TranscriptService.redactions function to TranscriptService.redactedAudio.
  • Add TranscriptService.redactedAudioFile function.
  • Add workerd export to fix cache issue with fetch on Cloudflare Workers.

[4.4.6]

  • Fix Rollup exports so __SDK_VERSION__ is properly replaced with the version of the SDK.

[4.4.5]

  • Add new PiiPolicy enum values

[4.4.4]

  • Add an export that only includes the Streaming STT code. You can use the export
    • by importing assemblyai/streaming,
    • or by loading the assemblyai.streaming.umd.js file, or assemblyai.streaming.umd.min.js file in a script-tag.
  • Add new EntityType enum values

[4.4.3] - 2024-05-09

  • Add react-native exports that resolve to the browser version of the library.

[4.4.2] - 2024-05-03

Changed

  • Caching is disabled for all HTTP request made by the SDK
  • Accept data-URIs in client.files.upload(dataUri), client.transcripts.submit(audio: dataUri), client.transcripts.transcribe(audio: dataUri).
  • Change how the WebSocket libraries are imported for better compatibility across frameworks and runtimes. The library no longer relies on a internal #ws import, and instead compiles the imports into the dist bundles. Browser builds will use the native WebSocket, other builds will use the ws package.

[4.4.1] - 2024-04-16

Changed

  • Deprecate enableExtraSessionInformation parameter in CreateRealtimeTranscriberParams type

[4.4.0] - 2024-04-12

Added

  • Add disablePartialTranscripts parameter to CreateRealtimeTranscriberParams
  • Add enableExtraSessionInformation parameter to CreateRealtimeTranscriberParams
  • Add session_information event to RealtimeTranscriber.on()

Changed

  • ⚠️ Deprecate conformer-2 literal for TranscriptParams.speech_model property

Fixed

  • Add missing status property to AutoHighlightsResult

[4.3.4] - 2024-04-02

Added

  • SpeechModel.Best enum
  • TranscriptListItem.error property

Changed

  • Make PageDetails.prev_url nullable
  • Rename Realtime to Streaming inside code documentation
  • More inline code documentation

Fixed

  • Rename SubstitutionPolicy literal "entity_type" to "entity_name"
  • Fix the pagination example in "List transcripts" sample on README

[4.3.3] - 2024-03-18

Added

  • GitHub action to generate API reference
  • Generate API reference with Typedoc and host on GitHub Pages

Changed

  • Add conformer-2 to SpeechModel type
  • Change language_code field to accept any string
  • Move from JSDoc to TSDoc
  • Update ws to 8.13.0
  • Update dev dependencies (no public facing changes)

[4.3.2] - 2024-03-08

Added

  • Add audio_url property to TranscribeParams in addition to the audio property. You can use one or the other. audio_url only accepts a URL string.
  • Add TranscriptReadyNotification type for the transcript webhook body.

Changed

  • Update codebase to use TSDoc
  • Update README.md with more samples

[4.3.0] - 2024-02-15

Added

  • Add RealtimeTranscriber.configureEndUtteranceSilenceThreshold function
  • Add RealtimeTranscriber.forceEndUtterance function
  • Add end_utterance_silence_threshold property to CreateRealtimeTranscriberParams and RealtimeTranscriberParams types.

[4.2.3] - 2024-02-13

Added

  • Add speech_model field to TranscriptParams and add SpeechModel type.

[4.2.2] - 2024-01-29

Changed

  • Windows paths passed to client.transcripts.transcribe and client.transcripts.submit will work as expected.

[4.2.1] - 2024-01-23

Added

  • Add answer_format to LemurActionItemsParams type

Changed

  • Rename RealtimeService to RealtimeTranscriber, RealtimeServiceFactory to RealtimeTranscriberFactory, RealtimeTranscriberFactory.createService() to RealtimeTranscriberFactory.transcriber(). Deprecated aliases are provided for all old types and functions for backwards compatibility.
  • Restrict the type for redact_pii_audio_quality from string to RedactPiiAudioQuality an enum string.

[4.2.0] - 2024-01-11

Added

  • Add content_safety_confidence to TranscriptParams & TranscriptOptionalParams.

Changed

  • The RealtimeService now sends audio as binary instead of a base64-encoded JSON object.

[4.1.0] - 2023-12-22

Added

  • Add "anthropic/claude-2-1" to LemurModel type
  • Add encoding option to the real-time service and factory. encoding can be "pcm_s16le" or "pcm_mulaw".
  • "pcm_mulaw" is a newly supported audio encoding for the real-time service.

Changed

  • Allow any string into final_model for LeMUR requests

[4.0.1] - 2023-12-14

Added

  • Add "assemblyai/mistral-7b" to LemurModel type

Changed

  • Update types with @example
  • Update types with Format: uuid if applicable

[4.0.0] - 2023-12-08

Added

  • Add node, deno, bun, browser, and workerd (Cloudflare Workers) exports to package.json. These exports are compatible versions of the SDK, with a few limitations in some cases. For more details, consult the SDK Compatibility document.
  • Add dist/assemblyai.umd.js and dist/assemblyai.umd.min.js. You can reference these script files directly in the browser and the SDK will be available at the global assemblyai variable.

Changed

  • RealtimeService.sendAudio accepts audio via type ArrayBufferLike.
  • Breaking: RealtimeService.stream returns a WHATWG Streams Standard stream, instead of a Node stream. In the browser, the native web standard stream will be used.
  • ws is used as the WebSocket client as before, but in the browser, the native WebSocket client is used.
  • Rename Node SDK to JavaScript SDK as the SDK is compatible with more runtimes now.

[3.1.1] - 2023-11-21

Added

  • Add client.transcripts.transcribe function to transcribe an audio file with polling until transcript status is completed or error. This function takes an audio option which can be an audio file URL, path, stream, or buffer.
  • Add client.transcripts.submit function to queue a transcript. You can use client.transcripts.waitUntilReady to poll the transcript returned by submit. This function also takes an audio option which can be an audio file URL, path, stream, or buffer.

Changed

  • Deprecated client.transcripts.create in favor of transcribe and submit, to be more consistent with other AssemblyAI SDKs.
  • Renamed types
    • Renamed Parameters type suffix with Params type suffix
    • Renamed CreateTranscriptParameters to TranscriptParams
    • Renamed CreateTranscriptOptionalParameters to TranscriptOptionalParams.
  • Added deprecated aliases for the forementioned types
  • Improved type docs

[3.1.0] - 2023-11-16

Added

  • Add AssemblyAI.transcripts.waitUntilReady function to wait until a transcript is ready, meaning status is completed or error.
  • Add chars_per_caption parameter to AssemblyAI.transcripts.subtitles function.
  • Add input_text property to LeMUR functions. Instead of using transcript_ids, you can use input_text to provide custom formatted transcripts as input to LeMUR.

Changed

  • Change default timeout from 3 minutes to infinite (-1). Fixes #17

Fixed

  • Correctly serialize the keywords for client.transcripts.wordSearch.
  • Use more widely compatible syntax for wildcard exporting types. Fixes #18.

[3.0.1] - 2023-10-30

Changed

  • The SDK uses fetch instead of Axios. This removes the Axios dependency. Axios relies on XMLHttpRequest which isn't supported in Cloudflare Workers, Deno, Bun, etc. By using fetch, the SDK is now more compatible on the forementioned runtimes.

Fixed

  • The SDK uses relative imports instead of using path aliases, to make the library transpilable with tsc for consumers. Fixes #14.
  • Added speaker property to the TranscriptUtterance type, and removed channel property.

[3.0.0] - 2023-10-24

Changed

  • AssemblyAI.files.upload accepts streams and buffers, in addition to a string (path to file).

Removed

  • Breaking: The module does not have a default export anymore, because of inconsistent functionality across module systems. Instead, use AssemblyAI as a named import like this: import { AssemblyAI } from 'assemblyai'.

[2.0.2] - 2023-10-13

Added

  • AssemblyAI.transcripts.wordSearch searches for keywords in the transcript.
  • AssemblyAI.lemur.purgeRequestData deletes data related to your LeMUR request.
  • RealtimeService.stream creates a writable stream that you can write audio data to instead of using RealtimeService.sendAudio`.

Fixed

  • The AssemblyAI class would be exported as default named export instead in certain module systems.

[2.0.1] - 2023-10-10

Re-implement the Node SDK in TypeScript and add all AssemblyAI APIs.

Added

  • Transcript API client
  • LeMUR API client
  • Real-time transcript client