Ministral 8B

Ministral 8B brings an interleaved sliding-window attention architecture to edge inference, delivering faster and more memory-efficient processing across its full context window of 128K tokens at $0.15 per million tokens.

Tool Use

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'mistral/ministral-8b',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Ministral 8B by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Ministral 8B

Ask Ministral 8B anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Mistral AI

128K

0.3s

82tps

$0.15/M

—

10/16/2024

More models by Mistral AI

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

mistral/mistral-medium-3.5

256K

0.3s

$1.50/M

$7.50/M

—

04/29/2026

mistral/mistral-large-3

256K

0.4s

61tps

$0.50/M

$1.50/M

—

12/02/2025

mistral/codestral-embed

$0.15/M

—

05/28/2025

mistral/ministral-3b

128K

0.3s

172tps

$0.10/M

—

10/16/2024

mistral/mistral-small

32K

0.3s

132tps

$0.10/M

$0.30/M

—

09/17/2024

mistral/mistral-embed

$0.10/M

—

12/11/2023

About Ministral 8B

Released October 16, 2024, Ministral 8B sits between the 3B and 14B models in Mistral AI's edge lineup. What sets Ministral 8B apart is its architecture: an interleaved sliding-window attention mechanism engineered for inference speed and memory efficiency.

Standard full-attention transformers require every token to attend to every other token, scaling quadratically with sequence length. Sliding-window attention limits each token's attention span, cutting memory usage. The interleaved design alternates between full-attention and windowed layers, preserving the ability to reason over long-range dependencies while keeping the memory footprint practical.

Ministral 8B uses its full context window of 128K tokens and supports function calling, knowledge retrieval, and commonsense reasoning.

Ministral 8B carries dual licensing: the Mistral AI Commercial License for production and the Mistral AI Research License for non-commercial work. This offers more flexibility than the 3B variant.

What To Consider When Choosing a Provider

Configuration: For workloads processing long documents or extended conversation histories, Ministral 8B's sliding-window architecture reduces the memory pressure typical of long-context inference.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Ministral 8B

Best For

Long-context processing: Sliding-window attention keeps memory footprint manageable when processing long inputs
Deeper reasoning than 3B: Tasks requiring more depth than Ministral 3B can provide
Function calling and tool use: With better accuracy than the 3B variant
Dual licensing research use cases: Covered by the Commercial and Research licenses

Consider Alternatives When

Smallest footprint and lowest cost: You need the absolute minimum (consider Ministral 3B)
Image understanding: Vision is required (consider Ministral 14B)

Conclusion

Ministral 8B earns its place through architectural innovation rather than just parameter scaling. The sliding-window attention design makes long-context inference more memory-efficient than standard transformers at this size.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Ministral 8B

Playground

Providers

More models by Mistral AI

About Ministral 8B

What To Consider When Choosing a Provider

When to Use Ministral 8B

Best For

Consider Alternatives When

Conclusion