Qwen3 235B A22B Thinking 2507

Qwen3 235B A22B Thinking 2507 is Alibaba's 235B MoE model configured for extended chain-of-thought reasoning, combining 235 billion total parameters with always-on deliberative reasoning for demanding inference tasks.

Vision (Image)Tool UseFile Input

index.ts

import { streamText } from 'ai'

const result = streamText({
  model: 'alibaba/qwen3-235b-a22b-thinking',
  prompt: 'Why is the sky blue?'
})

Overview About Providers Throughput Latency Uptime Status Similar FAQ

Playground

Try out Qwen3 235B A22B Thinking 2507 by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.

Qwen3 235B A22B Thinking 2507

Ask Qwen3 235B A22B Thinking 2507 anything to try it out.

Providers

Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.

Provider

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	ZDR	No Training	Release Date

Novita AI

131K

1.1s

71tps

$0.98/M

$3.95/M

—

04/01/2025

DeepInfra

262K

0.2s

44tps

$0.23/M

$2.30/M

Read:$0.2/M

Write:—

—

04/01/2025

Alibaba

131K

0.6s

129tps

$0.40/M

$4.00/M

—

04/01/2025

More models by Alibaba

Model

Context	Latency	Throughput	Input	Output	Cache	Web Search	Per Query	Capabilities	Providers	ZDR	No Training	Release Date

alibaba/qwen3.7-plus

1.0s

326tps

$0.32/M

$1.28/M

Read:$0.08/M

Write:$0.5/M

—

06/01/2026

alibaba/qwen3.7-max

991K

2.8s

55tps

$1.25/M

$3.75/M

Read:$0.25/M

Write:$1.56/M

—

05/21/2026

alibaba/qwen3.6-plus

1.8s

108tps

$0.50/M

$3.00/M

Read:

$0.1/M

Write:

$0.63/M

—

04/02/2026

alibaba/qwen3.5-flash

2.4s

172tps

$0.10/M

$0.40/M

Read:$0.0/M

Write:$0.13/M

—

02/24/2026

alibaba/qwen3-embedding-0.6b

33K

$0.01/M

—

11/14/2025

alibaba/qwen3-embedding-8b

33K

$0.05/M

—

06/05/2025

About Qwen3 235B A22B Thinking 2507

Qwen3 235B A22B Thinking 2507 is the Qwen3-235B-A22B configured with thinking mode as the default. The base model can switch between extended reasoning and direct response per request. This variant targets applications that need deliberate, chain-of-thought processing on every query.

The underlying architecture is the same 235B MoE: 235 billion total parameters with 22 billion activated per inference step. That MoE structure makes thinking mode tractable at this scale. Because only 22 billion parameters activate per token, Qwen3 235B A22B Thinking 2507 sustains long reasoning traces without the serving costs of a fully dense 235B model generating the same sequence length.

Chain-of-thought behavior is a first-class capability rather than something coaxed out by prompting. Alibaba reports that response quality scales smoothly with the computational reasoning budget allocated, so thinking longer genuinely helps on hard problems.

For the hardest categories of tasks (competitive mathematics, multi-hop logical reasoning, complex code debugging, and structured scientific analysis), this thinking-configured variant makes fuller use of the 235B parameter capacity. Benchmark results for the underlying model are competitive with other strong reasoning models on reasoning-heavy evaluations.

What To Consider When Choosing a Provider

Configuration: Provider selection may affect time-to-first-token for reasoning models, since longer thinking traces amplify any latency differences between providers.
Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

When to Use Qwen3 235B A22B Thinking 2507

Best For

Mathematical problem solving requiring detailed derivation: When answers need to show work, such as proofs, step-by-step calculations, or theorem verification, the always-on thinking mode ensures the model reasons carefully before committing to an answer
Complex debugging and code analysis: Tracing through multi-file codebases, identifying subtle bugs, or reasoning about race conditions and edge cases benefits from extended deliberation rather than pattern-matched output
Structured decision-support tasks: Applications in legal analysis, medical information synthesis, or financial modeling that require the model to consider multiple factors and surface its reasoning process explicitly
Difficult multi-hop question answering: Tasks where the final answer requires correctly executing a chain of dependent reasoning steps are where thinking models show the largest quality gains over non-thinking alternatives
Research assistance requiring transparent reasoning: When users need to audit or follow the model's reasoning process, the thinking trace provides visibility into how conclusions were reached

Consider Alternatives When

Response latency is critical: Thinking mode generates substantial internal tokens before producing the final answer. For real-time conversational interfaces or latency-sensitive pipelines, the non-thinking variant or a smaller model will respond much faster
Most queries are simple and don't require deliberation: Using a thinking model for routine tasks, formatting, translation, simple extraction, pays the latency and token cost of reasoning without meaningful quality benefit. The base Qwen3-235B-A22B model with thinking disabled is more appropriate for mixed workloads
Budget constraints are strict: Thinking traces add tokens to every response. If your application is cost-constrained, evaluate whether the quality improvement on your specific task distribution justifies the additional token usage

Conclusion

Qwen3 235B A22B Thinking 2507 is built for the class of tasks where getting the right answer justifies spending more tokens on reasoning. The MoE architecture makes it more economical to sustain long thinking traces than a dense model of comparable total scale, and the reasoning capability is built into the model rather than being a prompting trick. AI Gateway wraps the model with automated failover across novita, deepinfra, alibaba and a unified API surface.

Agent Stack

Core Platform

Tools

Learn

Build

Explore

Qwen3 235B A22B Thinking 2507

Playground

Providers

More models by Alibaba

About Qwen3 235B A22B Thinking 2507

What To Consider When Choosing a Provider

When to Use Qwen3 235B A22B Thinking 2507

Best For

Consider Alternatives When

Conclusion