Skip to content

Nemotron 3 Nano 30B A3B

nvidia/nemotron-3-nano-30b-a3b

Nemotron 3 Nano 30B A3B is a sparse hybrid Mamba-Transformer mixture-of-experts (MoE) model with 30B total parameters but only 3B active per token. It supports a context window of 262.1K tokens with throughput closer to a 3B dense model than a 30B one.

Reasoning
index.ts
import { streamText } from 'ai'
const result = streamText({
model: 'nvidia/nemotron-3-nano-30b-a3b',
prompt: 'Why is the sky blue?'
})

What To Consider When Choosing a Provider

  • Zero Data Retention

    AI Gateway supports Zero Data Retention for this model via direct gateway requests (BYOK is not included). To configure this, check the documentation.

    Authentication

    AI Gateway authenticates requests using an API key or OIDC token. You do not need to manage provider credentials directly.

With a context window of 262.1K tokens, entire codebases or multi-document evidence sets fit in a single call. Plan context usage carefully. Filling the window is possible, but model the cost and latency implications ahead of time. Compare $0.05 and $0.24.

When to Use Nemotron 3 Nano 30B A3B

Best For

  • Concurrent multi-agent systems:

    Running many lightweight agents where per-agent throughput matters

  • Long-context tasks:

    Holding entire codebases, extended session histories, or multi-document sets in one call

  • Agentic tool-calling workflows:

    Multi-step pipelines with chained actions

Consider Alternatives When

  • Maximum reasoning depth:

    Nemotron 3 Super (120B/12B active) handles complex multi-agent planning

  • Vision-language tasks:

    Nemotron Nano 12B v2 VL is the multimodal option

  • Smaller context needs:

    A 128K context window is sufficient and the 262.1K tokens capacity goes unused

  • Compact dense reasoning:

    Nemotron Nano 9B v2 targets a dense model profile

Conclusion

Nemotron 3 Nano 30B A3B delivers the throughput of a small model with the knowledge breadth of a large one. Its hybrid Mamba-Transformer MoE architecture and context of 262.1K tokens suits tasks that require holding large amounts of information in a single pass. Use AI Gateway to route traffic with unified auth.

FAQ

You pay for compute proportional to the active parameters, not the total. Nemotron 3 Nano 30B A3B runs at speeds and costs closer to a 3B dense model but draws on 30B parameters of learned knowledge. The MoE routing mechanism selects the relevant subset per token.

Mamba layers process sequences with linear-time complexity rather than the quadratic scaling of standard attention. That makes it practical to hold 262.1K tokens in context without the memory explosion that would make pure-attention models infeasible at that length.

They use different architectures. Nemotron 3 Nano 30B A3B is a sparse MoE with 30B total/3B active parameters and a context window of 262.1K tokens. Nemotron Nano 9B v2 is a dense 9B model with a 128K-token context window. Choose Nemotron 3 Nano 30B A3B for throughput across multi-agent systems and Nano 9B v2 as a compact reasoning model.

Current pricing is shown on this page. AI Gateway routes across providers, and rates may vary by provider.