File: generate-speech.md | Updated: 11/15/2025
Menu
v5 (Latest)
AI SDK 5.x
Model Context Protocol (MCP) Tools
Experimental_StdioMCPTransport
Copy markdown
===================================================================================================
generateSpeech is an experimental feature.
Generates speech audio from text.
import { experimental_generateSpeech as generateSpeech } from 'ai';import { openai } from '@ai-sdk/openai';
const { audio } = await generateSpeech({ model: openai.speech('tts-1'), text: 'Hello from the AI SDK!', voice: 'alloy',});
console.log(audio);
import { experimental_generateSpeech as generateSpeech } from 'ai';import { openai } from '@ai-sdk/openai';
const { audio } = await generateSpeech({ model: openai.speech('tts-1'), text: 'Hello from the AI SDK!', voice: 'alloy',});
import { experimental_generateSpeech as generateSpeech } from 'ai';import { elevenlabs } from '@ai-sdk/elevenlabs';
const { audio } = await generateSpeech({ model: elevenlabs.speech('eleven_multilingual_v2'), text: 'Hello from the AI SDK!', voice: 'your-voice-id', // Required: get this from your ElevenLabs account});
import { experimental_generateSpeech as generateSpeech } from "ai"
SpeechModelV2
The speech model to use.
string
The text to generate the speech from.
string
The voice to use for the speech.
string
The output format to use for the speech e.g. "mp3", "wav", etc.
string
Instructions for the speech generation.
number
The speed of the speech generation.
string
The language for speech generation. This should be an ISO 639-1 language code (e.g. "en", "es", "fr") or "auto" for automatic language detection. Provider support varies.
Record<string, Record<string, JSONValue>>
Additional provider-specific options.
number
Maximum number of retries. Default: 2.
AbortSignal
An optional abort signal to cancel the call.
Record<string, string>
Additional HTTP headers for the request.
GeneratedAudioFile
The generated audio.
GeneratedAudioFile
string
Audio as a base64 encoded string.
Uint8Array
Audio as a Uint8Array.
string
MIME type of the audio (e.g. "audio/mpeg").
string
Format of the audio (e.g. "mp3").
SpeechWarning[]
Warnings from the model provider (e.g. unsupported settings).
Array<SpeechModelResponseMetadata>
Response metadata from the provider. There may be multiple responses if we made multiple calls to the model.
SpeechModelResponseMetadata
Date
Timestamp for the start of the generated response.
string
The ID of the response model that was used to generate the response.
unknown
Optional response body.
Record<string, string>
Response headers.
On this page
Deploy and Scale AI Apps with Vercel.
Vercel delivers the infrastructure and developer experience you need to ship reliable AI-powered applications at scale.
Trusted by industry leaders: