📄 ai-sdk/docs/advanced/caching

File: caching.md | Updated: 11/15/2025

Source: https://ai-sdk.dev/docs/advanced/caching

AI SDK

Menu

v5 (Latest)

AI SDK 5.x

AI SDK by Vercel

AI SDK 6 Beta

Foundations

Overview

Providers and Models

Prompts

Tools

Streaming

Getting Started

Navigating the Library

Next.js App Router

Next.js Pages Router

Svelte

Vue.js (Nuxt)

Node.js

Expo

Agents

Agents

Building Agents

Workflow Patterns

Loop Control

AI SDK Core

Overview

Generating Text

Generating Structured Data

Tool Calling

Model Context Protocol (MCP) Tools

Prompt Engineering

Settings

Embeddings

Image Generation

Transcription

Speech

Language Model Middleware

Provider & Model Management

Error Handling

Testing

Telemetry

AI SDK UI

Overview

Chatbot

Chatbot Message Persistence

Chatbot Resume Streams

Chatbot Tool Usage

Generative User Interfaces

Completion

Object Generation

Streaming Custom Data

Error Handling

Transport

Reading UIMessage Streams

Message Metadata

Stream Protocols

AI SDK RSC

Advanced

Prompt Engineering

Stopping Streams

Backpressure

Caching

Multiple Streamables

Rate Limiting

Rendering UI with Language Models

Language Models as Routers

Multistep Interfaces

Sequential Generations

Vercel Deployment Guide

Reference

AI SDK Core

AI SDK UI

AI SDK RSC

Stream Helpers

AI SDK Errors

Migration Guides

Troubleshooting

Copy markdown

Caching Responses

================================================================================

Depending on the type of application you're building, you may want to cache the responses you receive from your AI provider, at least temporarily.

Using Language Model Middleware (Recommended)


The recommended approach to caching responses is using language model middleware and the simulateReadableStream function.

Language model middleware is a way to enhance the behavior of language models by intercepting and modifying the calls to the language model. Let's see how you can use language model middleware to cache responses.

ai/middleware.ts

import { Redis } from '@upstash/redis';import {  type LanguageModelV2,  type LanguageModelV2Middleware,  type LanguageModelV2StreamPart,  simulateReadableStream,} from 'ai';
const redis = new Redis({  url: process.env.KV_URL,  token: process.env.KV_TOKEN,});
export const cacheMiddleware: LanguageModelV2Middleware = {  wrapGenerate: async ({ doGenerate, params }) => {    const cacheKey = JSON.stringify(params);
    const cached = (await redis.get(cacheKey)) as Awaited<      ReturnType<LanguageModelV2['doGenerate']>    > | null;
    if (cached !== null) {      return {        ...cached,        response: {          ...cached.response,          timestamp: cached?.response?.timestamp            ? new Date(cached?.response?.timestamp)            : undefined,        },      };    }
    const result = await doGenerate();
    redis.set(cacheKey, result);
    return result;  },  wrapStream: async ({ doStream, params }) => {    const cacheKey = JSON.stringify(params);
    // Check if the result is in the cache    const cached = await redis.get(cacheKey);
    // If cached, return a simulated ReadableStream that yields the cached result    if (cached !== null) {      // Format the timestamps in the cached response      const formattedChunks = (cached as LanguageModelV2StreamPart[]).map(p => {        if (p.type === 'response-metadata' && p.timestamp) {          return { ...p, timestamp: new Date(p.timestamp) };        } else return p;      });      return {        stream: simulateReadableStream({          initialDelayInMs: 0,          chunkDelayInMs: 10,          chunks: formattedChunks,        }),      };    }
    // If not cached, proceed with streaming    const { stream, ...rest } = await doStream();
    const fullResponse: LanguageModelV2StreamPart[] = [];
    const transformStream = new TransformStream<      LanguageModelV2StreamPart,      LanguageModelV2StreamPart    >({      transform(chunk, controller) {        fullResponse.push(chunk);        controller.enqueue(chunk);      },      flush() {        // Store the full response in the cache after streaming is complete        redis.set(cacheKey, fullResponse);      },    });
    return {      stream: stream.pipeThrough(transformStream),      ...rest,    };  },};

This example uses @upstash/redis to store and retrieve the assistant's responses but you can use any KV storage provider you would like.

LanguageModelMiddleware has two methods: wrapGenerate and wrapStream. wrapGenerate is called when using generateText and generateObject , while wrapStream is called when using streamText and streamObject .

For wrapGenerate, you can cache the response directly. Instead, for wrapStream, you cache an array of the stream parts, which can then be used with simulateReadableStream function to create a simulated ReadableStream that returns the cached response. In this way, the cached response is returned chunk-by-chunk as if it were being generated by the model. You can control the initial delay and delay between chunks by adjusting the initialDelayInMs and chunkDelayInMs parameters of simulateReadableStream.

You can see a full example of caching with Redis in a Next.js application in our Caching Middleware Recipe .

Using Lifecycle Callbacks


Alternatively, each AI SDK Core function has special lifecycle callbacks you can use. The one of interest is likely onFinish, which is called when the generation is complete. This is where you can cache the full response.

Here's an example of how you can implement caching using Vercel KV and Next.js to cache the OpenAI response for 1 hour:

This example uses Upstash Redis and Next.js to cache the response for 1 hour.

app/api/chat/route.ts

import { openai } from '@ai-sdk/openai';import { formatDataStreamPart, streamText, UIMessage } from 'ai';import { Redis } from '@upstash/redis';
// Allow streaming responses up to 30 secondsexport const maxDuration = 30;
const redis = new Redis({  url: process.env.KV_URL,  token: process.env.KV_TOKEN,});
export async function POST(req: Request) {  const { messages }: { messages: UIMessage[] } = await req.json();
  // come up with a key based on the request:  const key = JSON.stringify(messages);
  // Check if we have a cached response  const cached = await redis.get(key);  if (cached != null) {    return new Response(formatDataStreamPart('text', cached), {      status: 200,      headers: { 'Content-Type': 'text/plain' },    });  }
  // Call the language model:  const result = streamText({    model: openai('gpt-4o'),    messages: convertToModelMessages(messages),    async onFinish({ text }) {      // Cache the response text:      await redis.set(key, text);      await redis.expire(key, 60 * 60);    },  });
  // Respond with the stream  return result.toUIMessageStreamResponse();}

On this page

Caching Responses

Using Language Model Middleware (Recommended)

Using Lifecycle Callbacks

Deploy and Scale AI Apps with Vercel.

Vercel delivers the infrastructure and developer experience you need to ship reliable AI-powered applications at scale.

Trusted by industry leaders:

  • OpenAI
  • Photoroom
  • leonardo-ai Logoleonardo-ai Logo
  • zapier Logozapier Logo

Talk to an expert