Qwen3 235B A22B Thinking 2507
Qwen3 235B A22B Thinking 2507 is Alibaba's 235B MoE model configured for extended chain-of-thought reasoning, combining 235 billion total parameters with always-on deliberative reasoning for demanding inference tasks.
import { streamText } from 'ai'
const result = streamText({ model: 'alibaba/qwen3-235b-a22b-thinking', prompt: 'Why is the sky blue?'})Playground
Try out Qwen3 235B A22B Thinking 2507 by Alibaba. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask Qwen3 235B A22B Thinking 2507 anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Alibaba
| Model |
|---|
About Qwen3 235B A22B Thinking 2507
Qwen3 235B A22B Thinking 2507 is the Qwen3-235B-A22B configured with thinking mode as the default. The base model can switch between extended reasoning and direct response per request. This variant targets applications that need deliberate, chain-of-thought processing on every query.
The underlying architecture is the same 235B MoE: 235 billion total parameters with 22 billion activated per inference step. That MoE structure makes thinking mode tractable at this scale. Because only 22 billion parameters activate per token, Qwen3 235B A22B Thinking 2507 sustains long reasoning traces without the serving costs of a fully dense 235B model generating the same sequence length.
Chain-of-thought behavior is a first-class capability rather than something coaxed out by prompting. Alibaba reports that response quality scales smoothly with the computational reasoning budget allocated, so thinking longer genuinely helps on hard problems.
For the hardest categories of tasks (competitive mathematics, multi-hop logical reasoning, complex code debugging, and structured scientific analysis), this thinking-configured variant makes fuller use of the 235B parameter capacity. Benchmark results for the underlying model are competitive with other strong reasoning models on reasoning-heavy evaluations.
What To Consider When Choosing a Provider
- Configuration: Provider selection may affect time-to-first-token for reasoning models, since longer thinking traces amplify any latency differences between providers.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use Qwen3 235B A22B Thinking 2507
Best For
- Mathematical problem solving requiring detailed derivation: When answers need to show work, such as proofs, step-by-step calculations, or theorem verification, the always-on thinking mode ensures the model reasons carefully before committing to an answer
- Complex debugging and code analysis: Tracing through multi-file codebases, identifying subtle bugs, or reasoning about race conditions and edge cases benefits from extended deliberation rather than pattern-matched output
- Structured decision-support tasks: Applications in legal analysis, medical information synthesis, or financial modeling that require the model to consider multiple factors and surface its reasoning process explicitly
- Difficult multi-hop question answering: Tasks where the final answer requires correctly executing a chain of dependent reasoning steps are where thinking models show the largest quality gains over non-thinking alternatives
- Research assistance requiring transparent reasoning: When users need to audit or follow the model's reasoning process, the thinking trace provides visibility into how conclusions were reached
Consider Alternatives When
- Response latency is critical: Thinking mode generates substantial internal tokens before producing the final answer. For real-time conversational interfaces or latency-sensitive pipelines, the non-thinking variant or a smaller model will respond much faster
- Most queries are simple and don't require deliberation: Using a thinking model for routine tasks, formatting, translation, simple extraction, pays the latency and token cost of reasoning without meaningful quality benefit. The base Qwen3-235B-A22B model with thinking disabled is more appropriate for mixed workloads
- Budget constraints are strict: Thinking traces add tokens to every response. If your application is cost-constrained, evaluate whether the quality improvement on your specific task distribution justifies the additional token usage
Conclusion
Qwen3 235B A22B Thinking 2507 is built for the class of tasks where getting the right answer justifies spending more tokens on reasoning. The MoE architecture makes it more economical to sustain long thinking traces than a dense model of comparable total scale, and the reasoning capability is built into the model rather than being a prompting trick. AI Gateway wraps the model with automated failover across novita, deepinfra, alibaba and a unified API surface.