Automatic Caching
Some providers like Anthropic and MiniMax require explicit cache control markers to enable prompt caching, while others like OpenAI, Google, and DeepSeek cache automatically (sometimes called "implicit caching"). Use caching: 'auto' to let AI Gateway handle this for you. It applies the appropriate caching strategy based on the provider.
When you set caching: 'auto' and the request routes to a provider that requires explicit cache markers (Anthropic or MiniMax), AI Gateway adds a cache_control breakpoint at the end of your static content. For providers with implicit caching (OpenAI, Google, DeepSeek), no modification is needed and caching works automatically.
Default behavior: When caching is not set, AI Gateway passes your request through without modification. Providers with implicit caching still cache automatically. For Anthropic, you'll need to set caching: 'auto' or manually add cache markers to your messages.
Supported providers: Automatic caching works with Anthropic (direct and Vertex) and MiniMax. Bedrock is not yet supported for automatic caching.
import { streamText } from 'ai';
export async function POST(request: Request) {
const { prompt } = await request.json();
const result = streamText({
model: 'anthropic/claude-sonnet-4.6',
system: 'You are a helpful assistant with access to a large knowledge base...',
prompt,
providerOptions: {
gateway: {
caching: 'auto',
},
},
});
return result.toUIMessageStreamResponse();
}import OpenAI from 'openai';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const openai = new OpenAI({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh/v1',
});
// @ts-expect-error - providerOptions is a gateway extension
const response = await openai.chat.completions.create({
model: 'anthropic/claude-sonnet-4.6',
messages: [
{
role: 'system',
content: 'You are a helpful assistant with access to a large knowledge base...',
},
{
role: 'user',
content: 'What is the capital of France?',
},
],
providerOptions: {
gateway: {
caching: 'auto',
},
},
});
console.log(response.choices[0].message.content);import os
from openai import OpenAI
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = OpenAI(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh/v1'
)
response = client.chat.completions.create(
model='anthropic/claude-sonnet-4.6',
messages=[
{
'role': 'system',
'content': 'You are a helpful assistant with access to a large knowledge base...'
},
{
'role': 'user',
'content': 'What is the capital of France?'
}
],
extra_body={
'providerOptions': {
'gateway': {
'caching': 'auto'
}
}
}
)
print(response.choices[0].message.content)const response = await fetch('https://ai-gateway.vercel.sh/v1/responses', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${apiKey}`,
},
body: JSON.stringify({
model: 'anthropic/claude-sonnet-4.6',
caching: 'auto',
instructions: 'You are a helpful assistant with access to a large knowledge base...',
input: [{ type: 'message', role: 'user', content: 'What is the capital of France?' }],
}),
});import Anthropic from '@anthropic-ai/sdk';
const apiKey = process.env.AI_GATEWAY_API_KEY || process.env.VERCEL_OIDC_TOKEN;
const anthropic = new Anthropic({
apiKey,
baseURL: 'https://ai-gateway.vercel.sh',
});
const message = await anthropic.messages.create({
model: 'anthropic/claude-sonnet-4.6',
max_tokens: 2048,
system: 'You are a helpful assistant with access to a large knowledge base...',
messages: [
{
role: 'user',
content: 'What is the capital of France?',
},
],
// @ts-expect-error - providerOptions is a gateway extension
providerOptions: {
gateway: {
caching: 'auto',
},
},
});
console.log(message.content[0].type === 'text' ? message.content[0].text : '');import os
import anthropic
api_key = os.getenv('AI_GATEWAY_API_KEY') or os.getenv('VERCEL_OIDC_TOKEN')
client = anthropic.Anthropic(
api_key=api_key,
base_url='https://ai-gateway.vercel.sh'
)
message = client.messages.create(
model='anthropic/claude-sonnet-4.6',
max_tokens=2048,
system='You are a helpful assistant with access to a large knowledge base...',
messages=[
{
'role': 'user',
'content': 'What is the capital of France?'
}
],
extra_body={
'providerOptions': {
'gateway': {
'caching': 'auto'
}
}
}
)
print(message.content[0].text)For fine-grained control over what gets cached, you can manually add cache markers instead of using caching: 'auto'. This gives you control over exactly which parts of your prompt are cached.
- Anthropic Messages API: Add
cache_control: { type: 'ephemeral' }to specific messages. See the Anthropic prompt caching docs for details. - AI SDK: Use the
cacheControlproperty on messages. See the AI SDK Anthropic provider docs for details. - OpenAI Chat Completions API: See prompt caching in the advanced guide.
| Provider | Caching type | caching: 'auto' effect |
|---|---|---|
| OpenAI | Implicit | No change needed. Caching happens automatically. |
| Implicit | No change needed. Caching happens automatically. | |
| DeepSeek | Implicit | No change needed. Caching happens automatically. |
| Anthropic | Explicit | Adds cache_control breakpoint to static content |
| Anthropic (via Vertex) | Explicit | Adds cache_control breakpoint to static content |
| MiniMax | Explicit | Adds cache markers to static content |
| Bedrock | Not supported | No effect |
Was this helpful?