GPT-4o mini
GPT-4o mini is OpenAI's cost-efficient multimodal model, priced at $0.15 per million input tokens, at reduced cost compared to GPT-3.5 Turbo, while outperforming GPT-4 on chat preference benchmarks and supporting vision and function calling.
import { streamText } from 'ai'
const result = streamText({ model: 'openai/gpt-4o-mini', prompt: 'Why is the sky blue?'})Playground
Try out GPT-4o mini by OpenAI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask GPT-4o mini anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by OpenAI
| Model |
|---|
About GPT-4o mini
GPT-4o mini launched on July 18, 2024 as OpenAI's cost-efficient model, positioned to replace GPT-3.5 Turbo for cost-sensitive deployments while providing meaningfully higher capability. The pricing stands out: $0.15 per million input tokens and $0.6 per million output tokens, at reduced cost compared to GPT-3.5 Turbo. It scored 82.0% on MMLU (Massive Multitask Language Understanding), exceeding GPT-3.5 Turbo, and topped GPT-4 on the LMSYS Chatbot Arena chat preference leaderboard at release.
GPT-4o mini supports vision alongside text, inheriting GPT-4o's multimodal design at the small-model tier. You can run cost-efficient image analysis, document processing, visual classification, and screenshot interpretation without routing to a larger model. Function calling support makes it viable as the reasoning layer in tool-using agents and API-calling pipelines.
OpenAI highlighted four patterns where GPT-4o mini excels: chaining or parallelizing multiple model calls, passing large volumes of context such as full codebases or conversation histories, fast real-time text responses for customer-facing interfaces, and workloads previously blocked by GPT-3.5 Turbo's capability ceiling. The context window of 128K tokens gives it substantial headroom for each of these.
What To Consider When Choosing a Provider
- Configuration: For applications that chain multiple model calls (classify, then extract, then format), GPT-4o mini's per-call cost makes it practical to run several sequential inferences per user request without the economics becoming prohibitive.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GPT-4o mini
Best For
- Customer support chatbots: Live interaction features requiring fast, affordable multi-turn responses
- Multi-call pipelines: Sequential or parallel model calls per user action where per-call cost accumulates quickly
- Budget vision workflows: Image description, document OCR assistance, and visual classification at the small-model tier
- Function-calling agents: Reliable tool invocation at low cost per call
- Large conversation histories: Processing codebases and extended chats within the context window of 128K tokens at minimal cost
Consider Alternatives When
- Higher quality ceiling: GPT-4o or GPT-4.1 handle complex reasoning, nuanced writing, or difficult coding tasks better
- Advanced multimodal processing: More capable vision or audio workloads require a larger model
- Deep chain-of-thought: O1-mini is purpose-built for extended reasoning
Conclusion
GPT-4o mini arrived as the model that made it economically viable to embed language model capability into every layer of an application, not just the final user-facing response, but classification, routing, extraction, and tool-use steps throughout a pipeline. Its combination of low price, multimodal input, function calling, and a context window of 128K tokens covers the majority of high-volume production use cases through AI Gateway.