📄 ai-sdk/cookbook/next/caching-middleware

File: caching-middleware.md | Updated: 11/15/2025

Source: https://ai-sdk.dev/cookbook/next/caching-middleware

AI SDK

Menu

Guides

RAG Agent

Multi-Modal Agent

Slackbot Agent Guide

Natural Language Postgres

Get started with Computer Use

Get started with Gemini 2.5

Get started with Claude 4

OpenAI Responses API

Google Gemini Image Generation

Get started with Claude 3.7 Sonnet

Get started with Llama 3.1

Get started with GPT-5

Get started with OpenAI o1

Get started with OpenAI o3-mini

Get started with DeepSeek R1

Next.js

Generate Text

Generate Text with Chat Prompt

Generate Image with Chat Prompt

Stream Text

Stream Text with Chat Prompt

Stream Text with Image Prompt

Chat with PDFs

streamText Multi-Step Cookbook

Markdown Chatbot with Memoization

Generate Object

Generate Object with File Prompt through Form Submission

Stream Object

Call Tools

Call Tools in Multiple Steps

Model Context Protocol (MCP) Tools

Share useChat State Across Components

Human-in-the-Loop Agent with Next.js

Send Custom Body from useChat

Render Visual Interface in Chat

Caching Middleware

Node

Generate Text

Generate Text with Chat Prompt

Generate Text with Image Prompt

Stream Text

Stream Text with Chat Prompt

Stream Text with Image Prompt

Stream Text with File Prompt

Generate Object with a Reasoning Model

Generate Object

Stream Object

Stream Object with Image Prompt

Record Token Usage After Streaming Object

Record Final Object after Streaming Object

Call Tools

Call Tools with Image Prompt

Call Tools in Multiple Steps

Model Context Protocol (MCP) Tools

Manual Agent Loop

Web Search Agent

Embed Text

Embed Text in Batch

Intercepting Fetch Requests

Local Caching Middleware

Retrieval Augmented Generation

Knowledge Base Agent

API Servers

Node.js HTTP Server

Express

Hono

Fastify

Nest.js

React Server Components

Copy markdown

Caching Middleware

=============================================================================================

This example is not yet updated to v5.

Let's create a simple chat interface that uses LanguageModelMiddleware to cache the assistant's responses in fast KV storage.

Client


Let's create a simple chat interface that allows users to send messages to the assistant and receive responses. You will integrate the useChat hook from @ai-sdk/react to stream responses.

app/page.tsx

'use client';
import { useChat } from '@ai-sdk/react';
export default function Chat() {  const { messages, input, handleInputChange, handleSubmit, error } = useChat();  if (error) return <div>{error.message}</div>;
  return (    <div className="flex flex-col w-full max-w-md py-24 mx-auto stretch">      <div className="space-y-4">        {messages.map(m => (          <div key={m.id} className="whitespace-pre-wrap">            <div>              <div className="font-bold">{m.role}</div>              {m.toolInvocations ? (                <pre>{JSON.stringify(m.toolInvocations, null, 2)}</pre>              ) : (                <p>{m.content}</p>              )}            </div>          </div>        ))}      </div>
      <form onSubmit={handleSubmit}>        <input          className="fixed bottom-0 w-full max-w-md p-2 mb-8 border border-gray-300 rounded shadow-xl"          value={input}          placeholder="Say something..."          onChange={handleInputChange}        />      </form>    </div>  );}

Middleware


Next, you will create a LanguageModelMiddleware that caches the assistant's responses in KV storage. LanguageModelMiddleware has two methods: wrapGenerate and wrapStream. wrapGenerate is called when using generateText and generateObject , while wrapStream is called when using streamText and streamObject .

For wrapGenerate, you can cache the response directly. Instead, for wrapStream, you cache an array of the stream parts, which can then be used with simulateReadableStream function to create a simulated ReadableStream that returns the cached response. In this way, the cached response is returned chunk-by-chunk as if it were being generated by the model. You can control the initial delay and delay between chunks by adjusting the initialDelayInMs and chunkDelayInMs parameters of simulateReadableStream.

ai/middleware.ts

import { Redis } from '@upstash/redis';import {  type LanguageModelV1,  type LanguageModelV2Middleware,  type LanguageModelV1StreamPart,  simulateReadableStream,} from 'ai';
const redis = new Redis({  url: process.env.KV_URL,  token: process.env.KV_TOKEN,});
export const cacheMiddleware: LanguageModelV2Middleware = {  wrapGenerate: async ({ doGenerate, params }) => {    const cacheKey = JSON.stringify(params);
    const cached = (await redis.get(cacheKey)) as Awaited<      ReturnType<LanguageModelV1['doGenerate']>    > | null;
    if (cached !== null) {      return {        ...cached,        response: {          ...cached.response,          timestamp: cached?.response?.timestamp            ? new Date(cached?.response?.timestamp)            : undefined,        },      };    }
    const result = await doGenerate();
    redis.set(cacheKey, result);
    return result;  },  wrapStream: async ({ doStream, params }) => {    const cacheKey = JSON.stringify(params);
    // Check if the result is in the cache    const cached = await redis.get(cacheKey);
    // If cached, return a simulated ReadableStream that yields the cached result    if (cached !== null) {      // Format the timestamps in the cached response      const formattedChunks = (cached as LanguageModelV1StreamPart[]).map(p => {        if (p.type === 'response-metadata' && p.timestamp) {          return { ...p, timestamp: new Date(p.timestamp) };        } else return p;      });      return {        stream: simulateReadableStream({          initialDelayInMs: 0,          chunkDelayInMs: 10,          chunks: formattedChunks,        }),      };    }
    // If not cached, proceed with streaming    const { stream, ...rest } = await doStream();
    const fullResponse: LanguageModelV1StreamPart[] = [];
    const transformStream = new TransformStream<      LanguageModelV1StreamPart,      LanguageModelV1StreamPart    >({      transform(chunk, controller) {        fullResponse.push(chunk);        controller.enqueue(chunk);      },      flush() {        // Store the full response in the cache after streaming is complete        redis.set(cacheKey, fullResponse);      },    });
    return {      stream: stream.pipeThrough(transformStream),      ...rest,    };  },};

This example uses @upstash/redis to store and retrieve the assistant's responses but you can use any KV storage provider you would like.

Server


Finally, you will create an API route for api/chat to handle the assistant's messages and responses. You can use your cache middleware by wrapping the model with wrapLanguageModel and passing the middleware as an argument.

app/api/chat/route.ts

import { cacheMiddleware } from '@/ai/middleware';import { openai } from '@ai-sdk/openai';import { wrapLanguageModel, streamText, tool } from 'ai';import { z } from 'zod';
const wrappedModel = wrapLanguageModel({  model: openai('gpt-4o-mini'),  middleware: cacheMiddleware,});
export async function POST(req: Request) {  const { messages } = await req.json();
  const result = streamText({    model: wrappedModel,    messages,    tools: {      weather: tool({        description: 'Get the weather in a location',        inputSchema: z.object({          location: z.string().describe('The location to get the weather for'),        }),        execute: async ({ location }) => ({          location,          temperature: 72 + Math.floor(Math.random() * 21) - 10,        }),      }),    },  });  return result.toUIMessageStreamResponse();}

On this page

Caching Middleware

Client

Middleware

Server

Deploy and Scale AI Apps with Vercel.

Vercel delivers the infrastructure and developer experience you need to ship reliable AI-powered applications at scale.

Trusted by industry leaders:

  • OpenAI
  • Photoroom
  • leonardo-ai Logoleonardo-ai Logo
  • zapier Logozapier Logo

Talk to an expert