GLM 4.7 Flash
GLM 4.7 Flash is the speed-optimized variant in Z.ai's GLM-4.7 generation, released January 19, 2026. It delivers faster inference for high-throughput workloads while retaining the coding, tool usage, and conversational improvements introduced in GLM-4.7.
import { streamText } from 'ai'
const result = streamText({ model: 'zai/glm-4.7-flash', prompt: 'Why is the sky blue?'})Playground
Try out GLM 4.7 Flash by Z.ai. Usage is billed to your team at API rates. Free users (those who haven't made a payment) get $5 of credits every 30 days.
Ask GLM 4.7 Flash anything to try it out.
Providers
Route requests across multiple providers. Copy a provider slug to set your preference. Visit the docs for more info. Using a provider means you agree to their terms, listed under Legal.
| Provider |
|---|
P50 throughput on live AI Gateway traffic, in tokens per second (TPS). Visit the docs for more info.
P50 time to first token (TTFT) on live AI Gateway traffic, in milliseconds. View the docs for more info.
Direct request success rate on AI Gateway and per-provider. Visit the docs for more info.
More models by Z.ai
| Model |
|---|
About GLM 4.7 Flash
GLM 4.7 Flash was released January 19, 2026 as the middle tier in Z.ai's GLM-4.7 generation, sitting between the full GLM-4.7 and the ultra-fast GLM-4.7-FlashX. It inherits the 4.7 generation's gains in coding assistance, tool usage, multi-step reasoning, and natural conversational tone while trading peak capability for faster inference.
The GLM-4.7 generation focused on closing coding and tool-use gaps with competing models. GLM 4.7 Flash carries those gains forward at a cost-and-latency profile that fits high-volume coding assistance, real-time chat, and production pipelines with strict response time budgets. If the full GLM-4.7 is too slow and GLM-4.7-FlashX strips too much capability, GLM 4.7 Flash is the compromise.
Through AI Gateway, switching between GLM-4.7 tiers requires only changing the model identifier. The API surface and request format stay the same.
What To Consider When Choosing a Provider
- Configuration: GLM 4.7 Flash sits in the middle of the 4.7 generation. Test it against both GLM-4.7 (higher capability) and GLM-4.7-FlashX (higher speed) on your specific tasks to find the right tradeoff.
- Configuration: All GLM-4.7 variants share the same API. You can A/B test across tiers without changing your integration.
- Configuration: The reduced per-token cost makes GLM 4.7 Flash practical for high-volume deployments where GLM-4.7's per-request cost would be prohibitive.
- Zero Data Retention: AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.
- Authentication: AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.
When to Use GLM 4.7 Flash
Best For
- High-volume coding assistance: Fast response times improve developer productivity across many concurrent sessions
- Real-time conversational applications: The 4.7 generation's natural tone under strict latency thresholds
- Production API backends: High request volumes where cost per token directly impacts margins
- Agentic pipelines: Most steps need good capability at speed, with the option to route complex steps to the full GLM-4.7
- Interactive prototyping and development: Fast iteration cycles depend on quick model responses
Consider Alternatives When
- Maximum complex-task capability: The full GLM-4.7 provides the deepest reasoning and coding quality
- Absolute fastest inference: GLM-4.7-FlashX offers the lowest latency in the 4.7 generation
- Vision capabilities needed: Evaluate GLM-4.6V or GLM-4.5V for multimodal input
- Advanced reasoning modes: GLM-5 provides multiple thinking modes and an expanded reasoning architecture
Conclusion
GLM 4.7 Flash sits between the full GLM-4.7 and GLM-4.7-FlashX: fast enough for many production latency budgets, and more capable than FlashX on heavier coding and tool-use tasks. Switch between 4.7 tiers through AI Gateway by changing the model identifier.