Skip to content
Dashboard

Qwen3 Embedding 4B

Qwen3 Embedding 4B is a mid-tier 4-billion-parameter text embedding model producing 2560-dimensional vectors over a context of 32.8K tokens, designed for multilingual semantic search and code retrieval that balances quality with operational cost.

index.ts
import { embed } from 'ai';
const result = await embed({
model: 'alibaba/qwen3-embedding-4b',
value: 'Sunny day at the beach',
})

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
ZDR
No Training
Release Date
DeepInfra
33K
$0.02/M——
06/05/2025

More models by Alibaba

Model
Context
Latency
Throughput
Input
Output
Cache
Web Search
Per Query
Capabilities
Providers
ZDR
No Training
Release Date
1M
1.1s
290tps
$0.32/M$1.28/M
Read:$0.08/M
Write:$0.5/M
——
+3
alibaba logo
fireworks logo
togetherai logo
06/01/2026
991K
2.6s
55tps
$1.25/M$3.75/M
Read:$0.25/M
Write:$1.56/M
——
+2
alibaba logo
05/21/2026
1M
1.8s
108tps
$0.50/M
$3.00/M
Read:
$0.1/M
Write:
$0.63/M
——
+3
alibaba logo
fireworks logo
togetherai logo
04/02/2026
1M
1.2s
292tps
$0.10/M$0.40/M
Read:$0.0/M
Write:$0.13/M
——
+3
alibaba logo
02/24/2026
33K
$0.01/M——
deepinfra logo
11/14/2025
33K
$0.05/M——
deepinfra logo
06/05/2025

About Qwen3 Embedding 4B

Qwen3 Embedding 4B represents the middle tier of the Qwen3 Embedding family, balancing retrieval quality against operational cost. Its 2560-dimensional output space captures richer semantic structure than the 0.6B variant, which translates to measurably better performance on dense retrieval benchmarks and multilingual similarity tasks without reaching the full resource requirements of the 8B model.

The embeddings handle asymmetric retrieval tasks where a short user query must match longer documents, and support user-defined instruction prefixes to adapt the embedding space to a specific domain or retrieval intent. Cross-lingual transfer is stable across the 100+ natural languages and multiple programming languages the Qwen3 Embedding family covers.

The context window of 32.8K tokens allows Qwen3 Embedding 4B to embed substantial passages in one shot, reducing the need for aggressive chunking in document-heavy workflows. Combined with Matryoshka Representation Learning (MRL), dimension counts can be adjusted at query time to trade off storage against precision, giving teams flexibility when scaling a vector index.

What To Consider When Choosing a Provider

  • Configuration: Provider selection is most consequential when your use case combines high query volume with data-sovereignty requirements, verify each provider's regional availability before finalizing your architecture.
  • Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
  • Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen3 Embedding 4B

Best For

  • Enterprise multilingual search: Applications that require high multilingual precision but can't justify the full cost of an 8B model
  • Semantic similarity and clustering: Datasets that span many languages or mix natural language with code
  • Quality-sensitive RAG: Pipelines serving diverse user populations where retrieval quality visibly affects answer accuracy
  • Cross-lingual alignment: Document alignment tasks where 2560-dimensional vectors provide better discriminability than smaller alternatives

Consider Alternatives When

  • Throughput and cost first: Dominant constraints where slightly lower recall is acceptable make the 0.6B variant sufficient
  • Maximum retrieval precision: Specialized domains with dense, technical vocabulary may be better served by the 8B model
  • Generative output required: This model produces embeddings only; use a generative model when you need text output

Conclusion

Qwen3 Embedding 4B is a well-positioned middle-ground choice for teams building multilingual retrieval systems that need better-than-baseline precision without committing to the full resource footprint of larger models. Its 2560-dimensional output and flexible dimension truncation via MRL give engineers a range of tradeoff points across the storage-vs-quality curve.