Alpha
amallo chat Icon
Wiki/Models/Qwen 3 Next 80B Thinking
model

Qwen 3 Next 80B Thinking

Qwen 3 Next 80B Thinking is a reasoning-oriented variant of the Qwen 3 large language model series, developed by Alibaba Cloud and released on September 18, 2025 11. Designed as an open-weights model, it is optimized for tasks involving mathematics, symbolic logic, and software engineering 1011. According to the developers, the model utilizes an internal "chain-of-thought" (CoT) mechanism, which enables it to process problems by generating intermediate reasoning steps before providing a final response 1112. This methodology is intended to address limitations in multi-step problem solving found in traditional direct-response models 1115.

The model utilizes a high-sparsity Mixture of Experts (MoE) framework 1112. It contains a total of 80 billion parameters, with 3 billion active during inference for any given token 1012. This 80B/3B configuration is intended to provide the knowledge capacity of a large-scale system while maintaining the computational efficiency and lower memory requirements of a smaller model 1112. The model supports a context window of 262,144 tokens, allowing for the processing of extensive documents and long-form reasoning tasks 823.

Performance evaluations by Artificial Analysis assigned the model an intelligence score of 27 on its Intelligence Index, ranking it 7th among 55 models in its category at the time of testing 115. This score is higher than the score of 20 recorded for the standard Qwen 3 Next 80B Instruct variant 216. During these evaluations, the Thinking model generated 51 million tokens, a volume approximately seven times the median for its class, which is attributed to the "reasoning tokens" produced during the internal processing phase 13.

The model demonstrates a throughput of 153.1 tokens per second on first-party infrastructure 315. This is lower than the 157.3 tokens per second measured for the non-thinking Instruct variant, representing the latency overhead introduced by the reasoning mechanism 23. The model is released under the Apache 2.0 license, facilitating self-hosting and commercial integration 310. API pricing for the Thinking variant has been cited at $0.50 per million input tokens and $6.00 per million output tokens 314. By comparison, the standard Instruct variant is available on third-party providers for $0.09 per million input tokens and $1.10 per million output tokens 2223.

Background

The development of Qwen 3 Next 80B Thinking occurred during a period of significant architectural transition in the field of large language models (LLMs). By late 2024 and early 2025, the industry moved beyond traditional instruction-tuned models toward "reasoning" or "thinking" models, characterized by their ability to generate internal chains of thought before producing a final response 3. This paradigm shift was largely catalyzed by the release of models such as OpenAI's o1 and DeepSeek-R1, which demonstrated that scaling inference-time computation could solve complex logic and mathematical problems that previously challenged dense LLMs.

Alibaba Cloud developed the Qwen 3 Next 80B Thinking as a specialized successor to the Qwen 2.5 series and the initial Qwen 3 base models 36. The primary motivation behind the model's design was the necessity to balance massive cognitive capacity with operational efficiency. While a dense 80-billion-parameter model offers substantial knowledge depth, the computational cost of inference often limits real-time utility. To address this, Alibaba utilized a Mixture of Experts (MoE) architecture 3. This configuration allows the model to maintain a total parameter count of 80 billion while activating only 3 billion parameters for any given token during inference, theoretically providing the intelligence of a large-scale model with the speed typically associated with much smaller architectures 3.

Released on September 11, 2025, the model was positioned as an open-weights alternative to proprietary reasoning systems 3. Third-party analysis by Artificial Analysis noted that the release coincided with a market demand for high-verbosity reasoning models that could handle long-form logical processing 3. Evaluation data showed the model was significantly more verbose than its peers, generating 51 million tokens during standardized intelligence testing compared to a class average of 7.3 million, reflecting its intensive use of internal "thinking" tokens to navigate complex prompts 3.

The model's release under the Apache 2.0 license continued Alibaba's established strategy of providing high-performance, open-weights models to the global developer community 3. At the time of its launch, it was categorized as a "medium-class" model (40B–150B parameters) and was evaluated as one of the faster reasoning models available, achieving an output speed of approximately 153.1 tokens per second despite the overhead required for its extended thinking processes 3.

Architecture

The architecture of Qwen3-Next-80B-A3B-Thinking is characterized by a 48-layer transformer-based structure that integrates high-sparsity Mixture of Experts (MoE) with a hybrid attention mechanism 1. The model contains 80 billion total parameters, though its computational footprint during inference is limited to approximately 3 billion active parameters per token 13. This sparse activation strategy is the basis for the "A3B" (Active 3 Billion) designation in the model's nomenclature 3. The model was pre-trained on a dataset comprising 15 trillion tokens and supports a context window of 262,000 tokens, which is equivalent to approximately 393 A4 pages of text 13.

Hybrid Attention and Layer Layout

Unlike standard transformer architectures that utilize uniform self-attention across all layers, Qwen3-Next-80B-A3B-Thinking employs a hybrid layout 1. This system combines Gated DeltaNet—a linear attention variant designed for efficient ultra-long context modeling—with traditional Gated Attention 1. The 48 layers are organized into 12 repeating macro-units 1. According to official technical documentation, each unit follows a specific sequence: three sub-layers of Gated DeltaNet paired with MoE blocks, followed by one sub-layer of Gated Attention also paired with an MoE block 1. This configuration is intended to provide the computational efficiency required for long-range dependencies while maintaining the representational precision associated with standard attention mechanisms 1.

Extreme Mixture of Experts

The model utilizes an "extreme" MoE configuration, featuring a total of 512 distinct experts 1. This represents a significant increase in granularity compared to previous MoE implementations, which typically employ 8 to 16 experts. For each token processed, the router selects 10 specialized experts plus one shared expert, a strategy referred to as a "10+1" activation pattern 1. The shared expert is designed to capture universal knowledge common to all inputs, while the 10 specialized experts provide task-specific processing 1. Alibaba Cloud states that this high ratio of total-to-active experts allows the model to maintain the performance capacity of an 80-billion-parameter system while operating with the latency of a much smaller model 1. Independent benchmarks by Artificial Analysis recorded an output speed of 153.1 tokens per second, confirming the efficiency of this sparse activation approach 3.

Training Methodology and Stability

The development of the model incorporated two primary technical innovations to improve training outcomes: Multi-Token Prediction (MTP) and Generalized Sparse Policy Optimization (GSPO) 1. MTP was used during the pre-training phase, requiring the model to predict multiple subsequent tokens simultaneously rather than a single next token 1. The developers assert that this technique enhances the model's grasp of long-term dependencies and improves overall reasoning coherence 1.

To manage the reinforcement learning (RL) phase, the Qwen team implemented GSPO to address stability and efficiency challenges 1. The use of GSPO is specifically intended to mitigate the training instabilities that often arise in architectures combining hybrid attention mechanisms with high-sparsity MoE layouts 1. These methodologies were applied during the training on 15 trillion tokens to optimize the model for complex reasoning tasks 1.

Inference and Thinking Mode

As a specialized reasoning variant, Qwen3-Next-80B-A3B-Thinking is configured to operate exclusively in a "thinking" mode 1. During inference, the model is prompted to generate internal chains of thought delimited by automatic <think> tags 1. Third-party analysis indicates that this model is notably verbose during reasoning phases; in standardized intelligence evaluations, it generated approximately 51 million tokens, whereas the average for comparable models was 7.3 million tokens 3. This high token output is a byproduct of the model's architectural focus on extended cognitive processing before delivering a final response 13.

Capabilities & Limitations

Qwen 3 Next 80B Thinking is a text-based model designed specifically for tasks that require high-order cognitive processing and internal chain-of-thought generation 3. Its core capability is a mandatory "thinking mode," which incorporates an automatic <think> tag system into the output stream 1. Unlike standard instruction-tuned models that provide immediate answers, this variant is designed to externalize its reasoning steps before delivering a final response, a process intended to improve accuracy in complex multi-step problems 1.

Reasoning and STEM Capabilities

According to Alibaba Cloud, the model demonstrates high performance in mathematics and logic benchmarks, specifically achieving a score of 0.88 on the AIME 2025 (American Invitational Mathematics Examination) and 0.83 on MMLU-Pro 1. These scores suggest a specialization in Olympiad-level mathematical reasoning and professional-grade multi-task language understanding 1. In independent testing by Artificial Analysis, the model received an Intelligence Index score of 27, which is notably higher than the average score of 15 for its model class 3.

In addition to mathematics, the model is characterized by its performance in complex coding tasks. Alibaba reports that the model outstrips several proprietary competitors, including Gemini 2.5 Flash Thinking, on internal reasoning benchmarks 1. The model supports a context window of 262,000 tokens, allowing it to process and reason over extensive documents or codebases 3.

Latency and Verbosity

A primary functional trade-off of the model's reasoning architecture is its high verbosity. During evaluation on the Artificial Analysis Intelligence Index, the model generated 51 million tokens, whereas the average for comparable models was approximately 7.3 million tokens 3. While the model maintains a high output speed of 153.1 tokens per second, the sheer volume of "thinking content" generated within the <think> tags can lead to significantly longer wall-clock times to reach the final answer 3. Alibaba notes that the model may generate extended thinking content, which increases the time-to-completion for user queries 1.

Limitations and Intended Use

The model is restricted to text input and output modalities and does not natively support image, audio, or video processing 3. Its specialization in deep reasoning introduces limitations for simple or direct tasks. Because the thinking mode is mandatory, the model may be less efficient for tasks where reasoning is unnecessary, such as basic entity extraction or casual conversational chat 1. Users have observed that the model continues to process internal reasoning steps even for straightforward prompts, which can be interpreted as a failure mode in terms of operational efficiency.

Furthermore, the model's economic profile differs significantly from non-reasoning open-weights models. Its output price is $6.00 per 1 million tokens, which is characterized as expensive compared to the class average of $0.57 3. This makes the model less suitable for high-volume, low-complexity applications. To mitigate the latency associated with the thinking process, external developers have proposed the use of logit processors or specialized templates to manually enforce the termination of the thinking phase by forcing a </think> token after a specific token budget is reached.

Performance

The performance of Qwen 3 Next 80B Thinking is characterized by its high inference efficiency relative to its total parameter count and its specialization in complex reasoning tasks. According to self-reported data from Alibaba Cloud, the model achieved a score of 82.7% on the MMLU-Pro benchmark and 88% on the AIME 2025 mathematics examination 1. The developer also reported a score of 77.2% on the GPQA scientific reasoning benchmark 1. In comparative evaluations provided by the Qwen team, the model reportedly outperforms other reasoning-focused systems, including Qwen3-32B-Thinking and Google's proprietary Gemini-2.5-Flash-Thinking 1.

Independent analysis by Artificial Analysis placed the model at rank 7 out of 55 models in its size class, assigning it an Intelligence Index of 27 3. This score is significantly higher than the class average of 15 3. The model's reasoning capabilities were further assessed using MMLU-Redux—a re-annotated version of the MMLU benchmark intended to eliminate errors and provide more reliable metrics—where it scored 0.93 out of 1.0 1. For instruction following and multi-domain writing, the developer reported scores of 0.89 on IFEval and 0.85 on WritingBench, respectively 1.

Operational performance metrics indicate high throughput and moderate cost. The model demonstrates an output speed of 153.1 tokens per second, which ranks it among the faster models in the medium-size class 3. This speed is largely attributed to its "A3B" architecture, which utilizes high-sparsity Mixture of Experts (MoE) to ensure that only 3 billion parameters are active per token during inference, despite the model containing 80 billion total parameters 13.

Regarding cost efficiency, the model is priced at $0.50 per 1 million input tokens and $6.00 per 1 million output tokens 3. While its input pricing is considered somewhat higher than the average for open-weight models in its class, its high verbosity impacts the total cost of ownership 3. Artificial Analysis noted that the model is extremely verbose, generating 51 million tokens during standard intelligence evaluations compared to a class average of 7.3 million tokens 3. This tendency toward long-form "thinking" content results in higher total output costs per task 13.

Safety & Ethics

The safety framework for Qwen 3 Next 80B Thinking is built upon alignment techniques specifically adapted to the complexities of extended chain-of-thought generation 1. According to Alibaba, the model undergoes multi-stage alignment involving Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) 1. These methods are applied not only to the final response but also to the internal reasoning tokens to ensure that the model’s logical progression remains transparent and adheres to established safety guidelines 1. This dual-layer alignment is intended to prevent the model from utilizing its internal 'thinking' phase to generate harmful or prohibited content that might otherwise be obscured from the final output 1. 1. 2. 3. 4. A primary focus of the model's safety architecture is the mitigation of 'deceptive alignment,' a risk where reasoning models may develop manipulative strategies or plan to circumvent safety filters within their internal reasoning traces 1. Alibaba states that the training protocol for Qwen 3 Next 80B Thinking includes specific penalties for reasoning paths that demonstrate sycophancy or attempt to bypass guardrails 1. By supervising the mandatory thinking mode, which uses a system of tags to externalize reasoning, the developer aims to ensure that the model's cognitive steps are consistent with its helpful and safe persona 1. 1. 2. 3. 4. As an open-weights model released under the Apache 2.0 license, Qwen 3 Next 80B Thinking is accessible for independent safety audits and red-teaming by the global research community 3. However, the model's high verbosity—generating significantly more tokens than standard instruction-tuned variants—presents distinct risks regarding the 'surface area' for potential hallucinations or subtle biases 3. Evaluations by Artificial Analysis noted that the model is notably verbose, producing an average of 51 million tokens during comprehensive intelligence testing compared to a median of 7.3 million for similar models 3. This increased output volume may require users to implement more robust secondary filtering systems to manage long-form generation risks 3. 1. 2. 3. 4. In terms of content filtering, the model is designed to comply with international standards regarding hate speech, illegal activities, and sexually explicit material 1. While its reasoning capabilities improve its ability to follow complex safety instructions, it remains susceptible to the inherent biases present in its training data 1. Although independent testing on the Artificial Analysis Intelligence Index has focused primarily on cognitive and agentic capabilities, third-party evaluations of social bias in the Qwen 3 series are ongoing 36. Alibaba asserts that while the model includes built-in safety measures, the open-weights nature of the deployment places a degree of responsibility on end-users to apply appropriate context-specific guardrails 3.

Applications

Qwen 3 Next 80B Thinking is primarily applied in environments requiring high-order logical verification and multi-step problem-solving. Alibaba Cloud asserts that the model is particularly suited for academic and engineering disciplines where the internal reasoning chain can be used to verify complex formulas and theoretical proofs 1. In the field of software development, the model is designed for deployment within autonomous coding agents 1. By generating a reasoning trace before outputting code, the model is intended to reduce syntax and logic errors, particularly in tasks involving technical terminal use and scientific coding 13.

In financial and political sectors, the model’s performance on benchmarks covering Finance & Business and Politics indicates its suitability for generating deep-dive reports and policy analysis 1. Its 262,000-token context window allows for the processing of extensive datasets, making it a candidate for Retrieval-Augmented Generation (RAG) workflows where the model must synthesize information from hundreds of pages of source material, such as legal filings or comprehensive financial audits 3.

The model is specifically optimized for advanced scientific reasoning and competitive mathematics, as evidenced by a 77.2% score on the GPQA benchmark and an 88% score on the AIME 2025 examination 1. These metrics suggest it is an ideal tool for researchers and students working on Olympiad-level mathematics or advanced scientific inquiries requiring symbolic logic 1.

However, third-party analysis by Artificial Analysis characterizes the model as highly verbose, generating significantly more tokens than non-reasoning counterparts during the evaluation of standard intelligence indices 3. Because of the associated operational costs—estimated at $0.50 per million input tokens and $6.00 per million output tokens—the model is generally not recommended for basic utility tasks 3. Scenarios such as simple classification, short-form chat, or basic information retrieval are often handled more efficiently by non-reasoning models, as the latency and cost of the mandatory "thinking" process in the 80B Thinking variant may not provide a proportional benefit in accuracy for low-complexity queries 3.

Reception & Impact

Industry and Technical Reception

The reception of Qwen 3 Next 80B Thinking has centered on its implementation of a high-sparsity Mixture of Experts (MoE) architecture, which utilizes 512 experts while only activating 3 billion parameters per token 13. Industry analysts from Artificial Analysis characterized the model as "notably fast," recording an output speed of 153.1 tokens per second, which ranked it 7th out of 55 models in its performance class at the time of evaluation 3. Technical reviewers have highlighted the efficiency of this "A3B" (Active 3 Billion) design, noting that it allows the model to achieve reasoning capabilities comparable to larger dense models while maintaining a lower computational footprint during inference 3.

Impact on the Open-Weights Movement

Following its release in September 2025 under the Apache 2.0 license, the model has been identified as a significant entry in the open-weights movement 13. By providing a publicly accessible alternative to proprietary reasoning models, it has intensified competition in the sector of

Version History

The development of Qwen 3 Next 80B Thinking followed the initial release of the Qwen 3 series on April 29, 2025 5. This earlier launch established the architectural foundation for the series, featuring a range of models from 600M to 235B parameters, including the Qwen3-32B dense variant and the Qwen3-30B-A3B Mixture-of-Experts (MoE) model 5. These early 2025 iterations introduced dual "thinking" and "non-thinking" modes, which allowed the models to switch between rapid responses and deep reasoning via internal chain-of-thought generation 5.

On September 10, 2025, Alibaba Cloud officially launched the Qwen3-Next-80B-A3B-Thinking model 1. This version represented a significant scaling of the reasoning-specialized series, expanding the total parameter count to 80 billion while utilizing a high-sparsity MoE structure to keep active parameters at 3 billion per token 13. The model was trained on a 15-trillion-token dataset and implemented a 48-layer architecture designed to outperform its predecessors, including the Qwen3-30B-A3B-Thinking-2507 and Qwen3-32B-Thinking prototypes 1.

Subsequent minor updates to the 'Next' series focused on enhancing the stability of the model's hybrid attention layer 1. This layer, which combines Gated DeltaNet and Gated Attention, was designed to support ultra-long context modeling 1. To address efficiency challenges during Reinforcement Learning (RL) training, the developer implemented Group-wise Sparse Preference Optimization (GSPO) 1. According to Alibaba Cloud, these updates were specifically intended to stabilize the interaction between the hybrid attention mechanism and the 512-expert MoE layout, ensuring more consistent performance in complex reasoning tasks where the model must maintain logic over extended thinking sequences 1.

Sources

  1. 1
    Qwen3 Next 80B A3B - Intelligence, Performance & Price Analysis. Retrieved March 25, 2026.

    Qwen3 Next 80B A3B (Reasoning) is an open weights model released September 2025 by Alibaba. It has 80B total parameters and 3B active parameters. It uses extended thinking or chain-of-thought reasoning to work through complex problems. It has a context window of 262k tokens and is released under the Apache 2.0 license.

  2. 2
    Qwen3 Next 80B A3B Instruct Intelligence, Performance & Price Analysis. Retrieved March 25, 2026.

    Qwen3 Next 80B A3B Instruct scores 20 on the Artificial Analysis Intelligence Index and has a speed of 157.3 tokens per second.

  3. 3
    Qwen3-Next-80B-A3B-Thinking: Pricing, Benchmarks & Performance. Retrieved March 25, 2026.

    Architecture: 48 layers, 15T training tokens, hybrid layout of 12*(3*(Gated DeltaNet->MoE)->(Gated Attention->MoE)). High-Sparsity MoE with 512 experts (10 activated + 1 shared), and Multi-Token Prediction. Leveraging GSPO, it addresses stability and efficiency challenges of hybrid attention + high-sparsity MoE in RL training.

  4. 5
    Qwen 3 Benchmarks, Comparisons, Model Specifications, and More. Retrieved March 25, 2026.

    Released on April 29, 2025, Qwen3 comes in eight sizes, including both dense models (from 600M to 32B parameters) and Mixture-of-Experts (MoE) giants... Qwen3 can switch between 'thinking' mode and 'non-thinking' mode.

  5. 6
    Users of Qwen3-Next-80B-A3B-Instruct-GGUF, How is ... - Reddit. Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/LocalLLaMA/comments/1pakey8/users_of_qwen3next80ba3binstructgguf_how_is/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](ht

  6. 8
    Qwen3-Next-80B Thinking | Generative AI on Vertex AI. Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen3-Next-80B Thinking","description":"","url":"https://docs.cloud.google.com/vertex-ai/generative-ai/docs/maas/qwen/qwen3-next-thinking","content":"# Qwen3-Next-80B Thinking | Generative AI on Vertex AI | Google Cloud Documentation\n[Skip to main content](https://docs.cloud.google.com/vertex-ai/generative-ai/docs/maas/qwen/qwen3-next-thinking#main-content)\n\n[![Image 1: Google Cloud Documentation](https://www.gstatic.com/devrel-devsite/prod/v8b8ef18

  7. 10
    Alibaba Qwen Team Just Released FP8 Builds of Qwen3-Next-80B .... Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Alibaba Qwen Team Just Released FP8 Builds of Qwen3-Next-80B-A3B (Instruct & Thinking), Bringing 80B/3B-Active Hybrid-MoE to Commodity GPUs","description":"Alibaba Qwen Team Just Released FP8 Builds of Qwen3-Next-80B-A3B (Instruct & Thinking), Bringing 80B/3B-Active Hybrid-MoE to Commodity GPUs","url":"https://www.marktechpost.com/2025/09/22/alibaba-qwen-team-just-released-fp8-builds-of-qwen3-next-80b-a3b-instruct-thinking-bringing-80b-3b-active-hybrid

  8. 11
    Qwen3-Next: Towards Ultimate Training & Inference Efficiency. Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen","description":"Qwen Chat offers comprehensive functionality spanning chatbot, image and video understanding, image generation, document processing, web search integration, tool utilization, and artifacts.","url":"https://qwen.ai/blog?id=4074cca80393150c248e508aa62983f9cb7d27cd&from=research.latest-advancements-list","content":"![Image 1](https://qianwen-res.oss-accelerate-overseas.aliyuncs.com/qwen3-next.png)\n\n## Introduction\n\nWe believe that

  9. 12
    Qwen3-Next: Revolutionary 80B Model with Only 3B Active .... Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen3-Next: Revolutionary 80B Model with Only 3B Active Parameters - Ultimate Efficiency Guide","description":"Deep dive into Qwen3-Next's groundbreaking architecture that achieves 10x training efficiency and matches models 10x its active size through hybrid attention and ultra-sparse MoE design","url":"https://colinmcnamara.com/blog/qwen3-next-ultimate-training-inference-efficiency-guide","content":"# Qwen3-Next: Revolutionary 80B Model with Only 3B A

  10. 14
    Qwen3 Next 80B A3B (Reasoning) vs Llama 4 Maverick. Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen3 Next 80B A3B (Reasoning) vs Llama 4 Maverick: Model Comparison","description":"Comparison between Qwen3 Next 80B A3B (Reasoning) and Llama 4 Maverick across intelligence, price, speed, context window and more.","url":"https://artificialanalysis.ai/models/comparisons/qwen3-next-80b-a3b-reasoning-vs-llama-4-maverick","content":"Comparison between Qwen3 Next 80B A3B (Reasoning) and Llama 4 Maverick across intelligence, price, speed, context window a

  11. 15
    Qwen3 Next 80B A3B Instruct vs gpt-oss-20B (high) - Artificial Analysis. Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen3 Next 80B A3B Instruct vs gpt-oss-20B (high): Model Comparison","description":"Comparison between Qwen3 Next 80B A3B Instruct and gpt-oss-20B (high) across intelligence, price, speed, context window and more.","url":"https://artificialanalysis.ai/models/comparisons/qwen3-next-80b-a3b-instruct-vs-gpt-oss-20b","content":"Comparison between Qwen3 Next 80B A3B Instruct and gpt-oss-20B (high) across intelligence, price, speed, context window and more.\

  12. 16
    qwen3-next-80b-a3b-thinking Model by Qwen - Nvidia NIM. Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"qwen3-next-80b-a3b-thinking Model by Qwen | NVIDIA NIM","description":"80B parameter AI model with hybrid reasoning, MoE architecture, support for 119 languages.","url":"https://build.nvidia.com/qwen/qwen3-next-80b-a3b-thinking","content":"# qwen3-next-80b-a3b-thinking Model by Qwen | NVIDIA NIM\n\n[![Image 1: NVIDIA](https://build.nvidia.com/_next/image?url=%2Fnvidia-logo.png&w=600&q=75)](https://build.nvidia.com/)\n\n[Explore](https://build.nvidia.co

Production Credits

View full changelog
Research
gemini-2.5-flash-liteMarch 25, 2026
Written By
gemini-3-flash-previewMarch 25, 2026
Fact-Checked By
claude-haiku-4-5March 25, 2026
Reviewed By
pending reviewMarch 25, 2026
This page was last edited on March 26, 2026 · First published March 25, 2026