📄 ai-sdk/cookbook/node/local-caching-middleware

File: local-caching-middleware.md | Updated: 11/15/2025

Source: https://ai-sdk.dev/cookbook/node/local-caching-middleware

AI SDK

Menu

Guides

RAG Agent

Multi-Modal Agent

Slackbot Agent Guide

Natural Language Postgres

Get started with Computer Use

Get started with Gemini 2.5

Get started with Claude 4

OpenAI Responses API

Google Gemini Image Generation

Get started with Claude 3.7 Sonnet

Get started with Llama 3.1

Get started with GPT-5

Get started with OpenAI o1

Get started with OpenAI o3-mini

Get started with DeepSeek R1

Next.js

Generate Text

Generate Text with Chat Prompt

Generate Image with Chat Prompt

Stream Text

Stream Text with Chat Prompt

Stream Text with Image Prompt

Chat with PDFs

streamText Multi-Step Cookbook

Markdown Chatbot with Memoization

Generate Object

Generate Object with File Prompt through Form Submission

Stream Object

Call Tools

Call Tools in Multiple Steps

Model Context Protocol (MCP) Tools

Share useChat State Across Components

Human-in-the-Loop Agent with Next.js

Send Custom Body from useChat

Render Visual Interface in Chat

Caching Middleware

Node

Generate Text

Generate Text with Chat Prompt

Generate Text with Image Prompt

Stream Text

Stream Text with Chat Prompt

Stream Text with Image Prompt

Stream Text with File Prompt

Generate Object with a Reasoning Model

Generate Object

Stream Object

Stream Object with Image Prompt

Record Token Usage After Streaming Object

Record Final Object after Streaming Object

Call Tools

Call Tools with Image Prompt

Call Tools in Multiple Steps

Model Context Protocol (MCP) Tools

Manual Agent Loop

Web Search Agent

Embed Text

Embed Text in Batch

Intercepting Fetch Requests

Local Caching Middleware

Retrieval Augmented Generation

Knowledge Base Agent

API Servers

Node.js HTTP Server

Express

Hono

Fastify

Nest.js

React Server Components

Copy markdown

Local Caching Middleware

===============================================================================================================

This example is not yet updated to v5.

When developing AI applications, you'll often find yourself repeatedly making the same API calls during development. This can lead to increased costs and slower development cycles. A caching middleware allows you to store responses locally and reuse them when the same inputs are provided.

This approach is particularly useful in two scenarios:

  1. Iterating on UI/UX - When you're focused on styling and user experience, you don't want to regenerate AI responses for every code change.
  2. Working on evals - When developing evals, you need to repeatedly test the same prompts, but don't need new generations each time.

Implementation


In this implementation, you create a JSON file to store responses. When a request is made, you first check if you have already seen this exact request. If you have, you return the cached response immediately (as a one-off generation or chunks of tokens). If not, you trigger the generation, save the response, and return it.

Make sure to add the path of your local cache to your .gitignore so you do not commit it.

How it works

For regular generations, you store and retrieve complete responses. Instead, the streaming implementation captures each token as it arrives, stores the full sequence, and on cache hits uses the SDK's simulateReadableStream utility to recreate the token-by-token streaming experience at a controlled speed (defaults to 10ms between chunks).

This approach gives you the best of both worlds:

  • Instant responses for repeated queries
  • Preserved streaming behavior for UI development

The middleware handles all transformations needed to make cached responses indistinguishable from fresh ones, including normalizing tool calls and fixing timestamp formats.

Middleware

import {  type LanguageModelV1,  type LanguageModelV2Middleware,  LanguageModelV1Prompt,  type LanguageModelV1StreamPart,  simulateReadableStream,  wrapLanguageModel,} from 'ai';import 'dotenv/config';import fs from 'fs';import path from 'path';
const CACHE_FILE = path.join(process.cwd(), '.cache/ai-cache.json');
export const cached = (model: LanguageModelV1) =>  wrapLanguageModel({    middleware: cacheMiddleware,    model,  });
const ensureCacheFile = () => {  const cacheDir = path.dirname(CACHE_FILE);  if (!fs.existsSync(cacheDir)) {    fs.mkdirSync(cacheDir, { recursive: true });  }  if (!fs.existsSync(CACHE_FILE)) {    fs.writeFileSync(CACHE_FILE, '{}');  }};
const getCachedResult = (key: string | object) => {  ensureCacheFile();  const cacheKey = typeof key === 'object' ? JSON.stringify(key) : key;  try {    const cacheContent = fs.readFileSync(CACHE_FILE, 'utf-8');
    const cache = JSON.parse(cacheContent);
    const result = cache[cacheKey];
    return result ?? null;  } catch (error) {    console.error('Cache error:', error);    return null;  }};
const updateCache = (key: string, value: any) => {  ensureCacheFile();  try {    const cache = JSON.parse(fs.readFileSync(CACHE_FILE, 'utf-8'));    const updatedCache = { ...cache, [key]: value };    fs.writeFileSync(CACHE_FILE, JSON.stringify(updatedCache, null, 2));    console.log('Cache updated for key:', key);  } catch (error) {    console.error('Failed to update cache:', error);  }};const cleanPrompt = (prompt: LanguageModelV1Prompt) => {  return prompt.map(m => {    if (m.role === 'assistant') {      return m.content.map(part =>        part.type === 'tool-call' ? { ...part, toolCallId: 'cached' } : part,      );    }    if (m.role === 'tool') {      return m.content.map(tc => ({        ...tc,        toolCallId: 'cached',        result: {},      }));    }
    return m;  });};
export const cacheMiddleware: LanguageModelV2Middleware = {  wrapGenerate: async ({ doGenerate, params }) => {    const cacheKey = JSON.stringify({      ...cleanPrompt(params.prompt),      _function: 'generate',    });    console.log('Cache Key:', cacheKey);
    const cached = getCachedResult(cacheKey) as Awaited<      ReturnType<LanguageModelV1['doGenerate']>    > | null;
    if (cached && cached !== null) {      console.log('Cache Hit');      return {        ...cached,        response: {          ...cached.response,          timestamp: cached?.response?.timestamp            ? new Date(cached?.response?.timestamp)            : undefined,        },      };    }
    console.log('Cache Miss');    const result = await doGenerate();
    updateCache(cacheKey, result);
    return result;  },  wrapStream: async ({ doStream, params }) => {    const cacheKey = JSON.stringify({      ...cleanPrompt(params.prompt),      _function: 'stream',    });    console.log('Cache Key:', cacheKey);
    // Check if the result is in the cache    const cached = getCachedResult(cacheKey);
    // If cached, return a simulated ReadableStream that yields the cached result    if (cached && cached !== null) {      console.log('Cache Hit');      // Format the timestamps in the cached response      const formattedChunks = (cached as LanguageModelV1StreamPart[]).map(p => {        if (p.type === 'response-metadata' && p.timestamp) {          return { ...p, timestamp: new Date(p.timestamp) };        } else return p;      });      return {        stream: simulateReadableStream({          initialDelayInMs: 0,          chunkDelayInMs: 10,          chunks: formattedChunks,        }),      };    }
    console.log('Cache Miss');    // If not cached, proceed with streaming    const { stream, ...rest } = await doStream();
    const fullResponse: LanguageModelV1StreamPart[] = [];
    const transformStream = new TransformStream<      LanguageModelV1StreamPart,      LanguageModelV1StreamPart    >({      transform(chunk, controller) {        fullResponse.push(chunk);        controller.enqueue(chunk);      },      flush() {        // Store the full response in the cache after streaming is complete        updateCache(cacheKey, fullResponse);      },    });
    return {      stream: stream.pipeThrough(transformStream),      ...rest,    };  },};

Using the Middleware


The middleware can be easily integrated into your existing AI SDK setup:

import { openai } from '@ai-sdk/openai';import { streamText } from 'ai';import 'dotenv/config';import { cached } from '../middleware/your-cache-middleware';
async function main() {  const result = streamText({    model: cached(openai('gpt-4o')),    maxOutputTokens: 512,    temperature: 0.3,    maxRetries: 5,    prompt: 'Invent a new holiday and describe its traditions.',  });
  for await (const textPart of result.textStream) {    process.stdout.write(textPart);  }
  console.log();  console.log('Token usage:', await result.usage);  console.log('Finish reason:', await result.finishReason);}
main().catch(console.error);

Considerations


When using this caching middleware, keep these points in mind:

  1. Development Only - This approach is intended for local development, not production environments
  2. Cache Invalidation - You'll need to clear the cache (delete the cache file) when you want fresh responses
  3. Multi-Step Flows - When using maxSteps, be aware that caching occurs at the individual language model response level, not across the entire execution flow. This means that while the model's generation is cached, the tool call is not and will run on each generation.

On this page

Local Caching Middleware

Implementation

How it works

Middleware

Using the Middleware

Considerations

Deploy and Scale AI Apps with Vercel.

Vercel delivers the infrastructure and developer experience you need to ship reliable AI-powered applications at scale.

Trusted by industry leaders:

  • OpenAI
  • Photoroom
  • leonardo-ai Logoleonardo-ai Logo
  • zapier Logozapier Logo

Talk to an expert