GPT-4o mini

GPT-4o mini is OpenAI's cost-efficient multimodal model, priced at $0.15 per million input tokens, at reduced cost compared to GPT-3.5 Turbo, while outperforming GPT-4 on chat preference benchmarks and supporting vision and function calling.

File InputTool UseVision (Image)Implicit Caching

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'openai/gpt-4o-mini',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out GPT-4o mini by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

GPT-4o mini

Ask GPT-4o mini anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Azure

128K

0.6s

60tps

$0.15/M

$0.60/M

Read:$0.07/M

Write:—

$14/K

+ input costs

—

07/18/2024

OpenAI

128K

0.8s

59tps

$0.15/M

$0.60/M

Read:$0.07/M

Write:—

$10.00/K

+ input costs

—

07/18/2024

More models by OpenAI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

openai/gpt-5.5

1.0s

62tps

$5.00/M

$30.00/M

Read:

$0.5/M

Write:

—

$10.00/K

+ input costs

—

04/24/2026

openai/gpt-5.4-mini

400K

1.0s

136tps

$0.75/M

$4.50/M

Read:$0.07/M

Write:—

$10.00/K

+ input costs

—

03/17/2026

openai/gpt-5.4

1.1M

2.4s

91tps

$2.50/M

$15.00/M

Read:

$0.25/M

Write:

—

$10.00/K

+ input costs

—

03/05/2026

openai/gpt-5.3-codex

400K

0.6s

55tps

$1.75/M

$14.00/M

Read:$0.17/M

Write:—

$10/K

+ input costs

—

02/24/2026

openai/gpt-5-mini

400K

4.9s

83tps

$0.25/M

$2.00/M

Read:$0.03/M

Write:—

$14/K

+ input costs

—

08/07/2025

openai/gpt-oss-120b

131K

0.2s

832tps

$0.35/M

$0.75/M

Read:$0.25/M

Write:—

—

08/05/2025

About GPT-4o mini

GPT-4o mini launched on July 18, 2024 as OpenAI's cost-efficient model, positioned to replace GPT-3.5 Turbo for cost-sensitive deployments while providing meaningfully higher capability. The pricing stands out: $0.15 per million input tokens and $0.6 per million output tokens, at reduced cost compared to GPT-3.5 Turbo. It scored 82.0% on MMLU (Massive Multitask Language Understanding), exceeding GPT-3.5 Turbo, and topped GPT-4 on the LMSYS Chatbot Arena chat preference leaderboard at release.

GPT-4o mini supports vision alongside text, inheriting GPT-4o's multimodal design at the small-model tier. You can run cost-efficient image analysis, document processing, visual classification, and screenshot interpretation without routing to a larger model. Function calling support makes it viable as the reasoning layer in tool-using agents and API-calling pipelines.

OpenAI highlighted four patterns where GPT-4o mini excels: chaining or parallelizing multiple model calls, passing large volumes of context such as full codebases or conversation histories, fast real-time text responses for customer-facing interfaces, and workloads previously blocked by GPT-3.5 Turbo's capability ceiling. The context window of 128K tokens gives it substantial headroom for each of these.

What To Consider When Choosing a Provider

Configuration: For applications that chain multiple model calls (classify, then extract, then format), GPT-4o mini's per-call cost makes it practical to run several sequential inferences per user request without the economics becoming prohibitive.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use GPT-4o mini

Best For

Customer support chatbots: Live interaction features requiring fast, affordable multi-turn responses
Multi-call pipelines: Sequential or parallel model calls per user action where per-call cost accumulates quickly
Budget vision workflows: Image description, document OCR assistance, and visual classification at the small-model tier
Function-calling agents: Reliable tool invocation at low cost per call
Large conversation histories: Processing codebases and extended chats within the context window of 128K tokens at minimal cost

Consider Alternatives When

Higher quality ceiling: GPT-4o or GPT-4.1 handle complex reasoning, nuanced writing, or difficult coding tasks better
Advanced multimodal processing: More capable vision or audio workloads require a larger model
Deep chain-of-thought: O1-mini is purpose-built for extended reasoning

Conclusion

GPT-4o mini arrived as the model that made it economically viable to embed language model capability into every layer of an application, not just the final user-facing response, but classification, routing, extraction, and tool-use steps throughout a pipeline. Its combination of low price, multimodal input, function calling, and a context window of 128K tokens covers the majority of high-volume production use cases through AI Gateway.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

GPT-4o mini

Playground

Providers

More models by OpenAI

About GPT-4o mini

What To Consider When Choosing a Provider

When to Use GPT-4o mini

Best For

Consider Alternatives When

Conclusion