Ministral 8B
Ministral 8B brings an interleaved sliding-window attention architecture to edge inference, delivering faster and more memory-efficient processing across its full context window of 128K tokens at $0.15 per million tokens.
import { streamText } from 'ai'
const result = streamText({ model: 'mistral/ministral-8b', prompt: 'Why is the sky blue?'})Playground
Try out Ministral 8B by Mistral AI. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask Ministral 8B anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Mistral AI
| Model |
|---|
About Ministral 8B
Released October 16, 2024, Ministral 8B sits between the 3B and 14B models in Mistral AI's edge lineup. What sets Ministral 8B apart is its architecture: an interleaved sliding-window attention mechanism engineered for inference speed and memory efficiency.
Standard full-attention transformers require every token to attend to every other token, scaling quadratically with sequence length. Sliding-window attention limits each token's attention span, cutting memory usage. The interleaved design alternates between full-attention and windowed layers, preserving the ability to reason over long-range dependencies while keeping the memory footprint practical.
Ministral 8B uses its full context window of 128K tokens and supports function calling, knowledge retrieval, and commonsense reasoning.
Ministral 8B carries dual licensing: the Mistral AI Commercial License for production and the Mistral AI Research License for non-commercial work. This offers more flexibility than the 3B variant.
What To Consider When Choosing a Provider
- Configuration: For workloads processing long documents or extended conversation histories, Ministral 8B's sliding-window architecture reduces the memory pressure typical of long-context inference.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Ministral 8B
Best For
- Long-context processing: Sliding-window attention keeps memory footprint manageable when processing long inputs
- Deeper reasoning than 3B: Tasks requiring more depth than Ministral 3B can provide
- Function calling and tool use: With better accuracy than the 3B variant
- Dual licensing research use cases: Covered by the Commercial and Research licenses
Consider Alternatives When
- Smallest footprint and lowest cost: You need the absolute minimum (consider Ministral 3B)
- Image understanding: Vision is required (consider Ministral 14B)
Conclusion
Ministral 8B earns its place through architectural innovation rather than just parameter scaling. The sliding-window attention design makes long-context inference more memory-efficient than standard transformers at this size.