Alpha
amallo chat Icon
Wiki/Models/Qwen 3 Next 80B Instruct
model

Qwen 3 Next 80B Instruct

Qwen 3 Next 80B Instruct is a large language model (LLM) developed by Alibaba's Qwen team as part of its Qwen 3 Next series of foundation models 1. Released as a successor to earlier Qwen iterations, the model is characterized by a focus on training and inference efficiency through the use of a highly sparse Mixture-of-Experts (MoE) architecture 12. Despite possessing a total of 80 billion parameters, the model activates only approximately 3 billion parameters during each inference step 1. According to Alibaba, this design allows the 80B model to achieve performance metrics comparable to its larger flagship models, such as the Qwen 3 235B A22B Instruct, while requiring significantly fewer computational resources for deployment 1.

The technical framework of Qwen 3 Next 80B Instruct utilizes a hybrid attention mechanism that combines Gated DeltaNet with standard Gated Attention 1. The developers state that this hybrid approach—implemented in a 3:1 ratio where 75% of layers use Gated DeltaNet—is intended to overcome the high inference costs of standard attention and the recall limitations of linear attention 1. The architecture also incorporates a native Multi-Token Prediction (MTP) mechanism, which is designed to improve the efficiency of speculative decoding and overall model throughput 1. For training stability, the model employs Zero-Centered RMSNorm and an attention output gating mechanism to mitigate common issues in large-scale MoE training, such as numerical instability and activation spikes 1.

In terms of capabilities, Qwen 3 Next 80B Instruct is optimized for long-context tasks, natively supporting a context window of up to 256,000 tokens 1. Performance evaluations provided by the developers indicate that the model maintains higher throughput than dense models of similar quality; specifically, for context lengths exceeding 32,000 tokens, the model reportedly delivers more than 10x higher throughput compared to the Qwen 3 32B dense model 1. Beyond its native limits, the model can be extended to handle contexts of up to 1 million tokens using Rotary Position Embedding (RoPE) scaling techniques such as YaRN 12.

The model was pre-trained on a 15-trillion token subset of the Qwen 3 corpus, utilizing approximately 9.3% of the compute cost required for the dense Qwen 3 32B variant 1. Alongside the standard Instruct version, Alibaba released a "Thinking" variant specifically tuned for complex reasoning and mathematical tasks 1. Qwen 3 Next 80B Instruct is distributed under the Apache 2.0 license and is accessible through open-source platforms like Hugging Face and ModelScope, as well as via the NVIDIA API Catalog and Alibaba Cloud Model Studio 1. It is compatible with several major inference frameworks, including vLLM and SGLang 1.

Background

The development of Qwen 3 Next 80B Instruct occurred during a period (2024–2025) in which large language model (LLM) research increasingly focused on two divergent trends: "Total Parameter Scaling," intended to enhance model intelligence, and "Context Length Scaling," aimed at improving the processing of extensive documents 13. Alibaba's Qwen team identified that standard Transformer architectures, which were the foundation of earlier iterations like Qwen 2, faced significant scaling bottlenecks in both training and inference efficiency 1.

Motivation and Architectural Shift

A primary motivation for the Qwen 3 Next architecture was to overcome the "quadratic complexity" inherent in standard attention mechanisms 1. In traditional Transformers, the computational cost of processing information increases quadratically relative to the length of the input sequence, making ultra-long context windows computationally expensive to maintain and slow to serve 1. While linear attention mechanisms were available as alternatives to break this complexity, the developers noted that they often exhibited weaker recall performance compared to standard attention 1.

To resolve this trade-off, the developers transitioned from the standard MoE structure used in earlier Qwen 3 models to a hybrid architecture 1. This design utilizes Gated DeltaNet (a form of linear attention) and Gated Attention in a 3:1 ratio, where 75% of the layers use the linear mechanism to manage long-context efficiency while the remaining 25% utilize standard attention to maintain performance levels 1. According to Alibaba, this hybrid approach allowed the model to outperform monolithic architectures in both efficiency and accuracy 1.

Parameter Scaling vs. Inference Efficiency

In the 2024–2025 AI landscape, a significant challenge was balancing the demand for larger parameter counts with the need for sustainable inference costs 13. The Qwen 3 Next 80B variant was designed to address this via an "Ultra-Sparse" Mixture-of-Experts (MoE) configuration 1. While previous Qwen 3 MoE models utilized 128 total experts with 8 routed per step, Qwen 3 Next expanded this to 512 total experts 1.

This architectural evolution allowed the 80B-parameter model to activate only approximately 3 billion parameters during any single inference step 1. The developers state that this configuration was intended to achieve performance comparable to the dense Qwen3-32B model while requiring less than 10% of its training cost in terms of GPU hours 1. The model was pre-trained on a 15-trillion-token subset of the larger Qwen 3 corpus to validate this efficiency-first approach 1.

Architecture

The architecture of Qwen 3 Next 80B Instruct is characterized by a hybrid design that integrates linear attention and standard attention mechanisms within a highly sparse Mixture-of-Experts (MoE) framework 14. The model contains 80 billion total parameters, but is designed to activate only approximately 3 billion parameters (3.7%) during each inference step to optimize computational efficiency 1. This configuration is intended to address the scaling limitations of standard Transformer architectures, particularly in terms of training costs and inference throughput for long-context tasks 1.

Hybrid Attention Mechanism

Qwen 3 Next 80B Instruct utilizes a 3:1 ratio of Gated DeltaNet layers to standard Gated Attention layers 1. According to Alibaba researchers, using only linear attention or only standard attention presents distinct limitations: linear attention is computationally efficient but often exhibits weaker recall, while standard attention offers high performance at the cost of quadratic complexity as sequence length increases 1.

In this hybrid arrangement, 75% of the layers utilize Gated DeltaNet, a form of linear attention that incorporates the Delta Rule to improve in-context learning compared to alternative architectures such as Mamba2 or Sliding Window Attention 1. The remaining 25% of layers utilize standard Gated Attention, which includes several modifications designed to enhance stability and performance:

  • Output Gating: An output gating mechanism is integrated to mitigate low-rank issues within the attention layers 1.
  • Head Dimension: The dimension per attention head was increased to 256 1.
  • RoPE Modification: Rotary Position Embedding (RoPE) is applied only to the first 25% of position dimensions, a technique used to improve the model's ability to extrapolate to longer sequences 1.

Ultra-Sparse Mixture-of-Experts (MoE)

The model's MoE structure consists of 512 total experts, an expansion from the 128 experts used in the standard Qwen 3 series 1. The routing strategy employs 10 routed experts and one shared expert per token 1. This ultra-sparse design aims to reduce training loss by increasing the total parameter count while maintaining a fixed amount of activated parameters during inference 1. To ensure that experts are utilized effectively from the start of training, the model normalizes MoE router parameters during initialization and utilizes global load balancing 1.

Training Stability and Normalization

To address numerical stability and prevent issues such as "Attention Sink" or massive activations, the architecture incorporates Zero-Centered RMSNorm 1. This design choice is intended to resolve issues where layer norm weights became abnormally large in previous iterations 1. Additionally, weight decay is applied to the normalization weights to prevent unbounded growth during large-scale training 1.

Multi-Token Prediction (MTP)

Qwen 3 Next 80B Instruct includes a native Multi-Token Prediction mechanism 1. This architecture allows the model to predict multiple future tokens simultaneously during training and inference 1. Alibaba states that this mechanism improves overall model performance and increases the acceptance rate for speculative decoding, which contributes to faster inference speeds in production environments 1.

Context Window and Extrapolation

The model natively supports a context window of up to 262,144 tokens (256K) 1. For tasks requiring ultra-long context processing up to 1 million tokens, the developers recommend the use of the YaRN (Yet another RoPE extensioN) scaling method 1. Alibaba asserts that the combination of the hybrid attention design and these scaling techniques allows the 80B model to maintain performance on long-context benchmarks that is comparable to larger dense models 1.

Capabilities & Limitations

Qwen 3 Next 80B Instruct is designed as a high-efficiency text-based model, supporting a native context length of 256,000 tokens 1. According to Alibaba, the model can be extended to process up to 1 million tokens using the Yet another RoPE extensioN (YaRN) method, although the developer notes that static YaRN implementations may impact performance on shorter texts 1. In independent comparisons, the model is characterized as an open-weights alternative to proprietary systems, though it lacks native support for non-text modalities such as image or video input 45.

Primary Capabilities

The model is optimized for two main areas: long-context processing and agentic workflows. Alibaba reports that Qwen 3 Next 80B Instruct achieves performance comparable to its larger flagship, the Qwen3-235B-A22B-Instruct, while providing more than 10x higher throughput for contexts exceeding 32,000 tokens 1.

  • Agentic Use and Tool-Calling: The model is intended for use in agentic systems through integration with the Qwen-Agent framework 1. It supports function calling and the generation of structured output, such as JSON 5. The developer states that Qwen-Agent encapsulates tool-calling templates and parsers to simplify the deployment of the model in complex workflows 1.
  • Multi-Token Prediction (MTP): The architecture includes a native MTP mechanism designed to improve speculative decoding acceptance rates 1. Alibaba claims this allows for faster inference during the decoding stage, achieving nearly 4x higher throughput than the dense Qwen3-32B model at 4,000 tokens 1.
  • Reasoning Variants: A distinction exists between the standard 'Instruct' model and the 'Thinking' variant. While the Instruct version is designed for general-purpose assistant tasks, the Thinking version is specialized for complex reasoning, where it reportedly approaches the performance of larger proprietary models like Gemini-2.5-Flash-Thinking on specific benchmarks 1.

Limitations and Failure Modes

Despite its efficiency, the hybrid architecture of Qwen 3 Next 80B Instruct introduces specific trade-offs and limitations.

  • Recall Constraints: Alibaba acknowledges that while the hybrid Gated DeltaNet and Gated Attention mechanism improves speed, linear attention components are fundamentally weaker at information recall compared to standard softmax attention 1. To mitigate this, the model maintains a 3:1 ratio of linear to standard attention layers 1.
  • Context Scaling Degradation: When using YaRN scaling to reach the 1-million-token limit, the model may experience performance degradation on short-form inputs 1. The developer recommends modifying the scaling factor specifically for the expected context length (e.g., using a factor of 2.0 for 524,288 tokens) and only enabling these configurations when processing ultra-long texts 1.
  • Hardware and Deployment Requirements: Effective utilization of the model's specialized features, such as MTP and the 256K context window, requires specific inference frameworks. The developer advises using SGLang or vLLM to achieve stated throughput benefits, as standard Hugging Face Transformers implementations may not support MTP or the optimized linear attention kernels (e.g., flash-linear-attention) required for peak performance 1.
  • Use Case Restrictions: The model is not intended for multimodal tasks without external adapters. It is strictly a text-in, text-out system, lacking the visual understanding capabilities found in the 'VL' (Vision-Language) variants of the Qwen series 5.

Performance

Qwen 3 Next 80B Instruct is characterized by high computational efficiency, with Alibaba reporting that the model achieves performance levels comparable to the 235-billion-parameter Qwen3 flagship while utilizing approximately one-tenth of the compute 1. In developer evaluations, the Instruct variant outperformed previous iterations like the Qwen3-30B-A3B and the dense Qwen3-32B across several standard benchmarks, including MMLU for general knowledge, GPQA for complex reasoning, and HumanEval for programming tasks 1. Independent observers have noted that the model's release in September 2025 reflects an accelerated iteration cycle in open-weight model development intended to challenge proprietary systems 4.

In long-context evaluations, the model demonstrated high retrieval accuracy on the RULER benchmark. Alibaba asserts that Qwen 3 Next 80B Instruct maintains performance superiority over models with higher layer counts, such as the Qwen3-30B and the larger Qwen3-235B, for sequence lengths up to 256,000 tokens 1. This capability is attributed to the integration of Gated DeltaNet, which the developers state provides more robust in-context learning than alternative architectures like Mamba2 or Sliding Window Attention 1. The model's use of rotary position encoding applied to only 25% of dimensions is also cited as a factor in its ability to extrapolate effectively to longer sequences 1.

The model's sparse architecture, which activates only 3.7% of its 80 billion parameters (approximately 3 billion parameters) per inference step, results in substantial throughput advantages over dense models 1. According to the Qwen team, at a context length of 4,000 tokens, the prefill throughput is nearly 7 times higher than that of the Qwen3-32B, while decoding throughput is nearly 4 times higher 1. For tasks involving ultra-long contexts exceeding 32,000 tokens, the speed advantage for both prefill and decoding stages increases to more than 10 times that of the 32B model 1.

Training efficiency is identified as a primary performance metric for the model. The underlying base model was trained on 15 trillion tokens using less than 10% of the GPU hours required for the Qwen3-32B, despite yielding superior benchmark results 1. The Qwen team also emphasizes that the inclusion of a native Multi-Token Prediction (MTP) mechanism enhances real-world inference speed by improving the acceptance rate of speculative decoding 1. Furthermore, the implementation of Zero-Centered RMSNorm and Mixture-of-Experts (MoE) router normalization is reported to have improved numerical stability during the training process, contributing to more consistent performance across scales 1.

Safety & Ethics

The safety and ethical framework of Qwen 3 Next 80B Instruct relies on a combination of post-training alignment techniques and architectural safeguards designed to ensure model stability and adherence to content guidelines. As an 'Instruct' variant, the model is specifically tuned to follow human instructions while mitigating common risks associated with large-scale generative systems 1.

Alignment and Post-training

Qwen 3 Next 80B Instruct was developed using reinforcement learning (RL) to align its outputs with human preferences 1. Alibaba states that the training process for the Qwen 3 Next series specifically addressed long-standing stability and efficiency issues often encountered during RL training with hybrid attention and sparse Mixture-of-Experts (MoE) architectures 1. This alignment process is intended to improve the model's reliability in following complex instructions and producing helpful, safe responses. While the developer characterizes the performance of the Instruct version as comparable to their larger flagship models, independent verification of its specific safety benchmarks against external datasets remains a standard requirement for deployment in sensitive environments 14.

Architectural Safety Features

The model incorporates several technical designs aimed at preventing numerical instability and erratic behavior during inference. A primary mechanism is 'output gating,' which is integrated into the standard attention layers 1. According to Alibaba, this mechanism helps eliminate 'Attention Sink' and 'Massive Activation' issues, which can otherwise lead to degraded performance or unpredictable outputs in long-context scenarios 1.

To further ensure training and inference stability, the model utilizes Zero-Centered RMSNorm and applies weight decay to normalization weights 1. These features are designed to prevent layer norm weights from growing unboundedly, a phenomenon that can cause numerical overflow and affect the ethical consistency of the model's reasoning 1. Additionally, the MoE router parameters are normalized during initialization to ensure unbiased expert selection, reducing the risk of the model developing internal 'biases' toward specific computational paths early in its training 1.

Content Filtering and Deployment

Qwen 3 Next 80B Instruct is subject to Alibaba's internal safety guidelines and content filtering protocols when accessed through public-facing interfaces, such as the Alibaba Cloud Model Studio and the NVIDIA API Catalog 1. These platforms typically employ secondary safety layers to block prohibited content, including hate speech, harassment, and instructions for illegal activities. For users deploying the model locally via open-weights repositories like Hugging Face, the developers provide the model under an open-source license (Apache 2.0), shifting the responsibility for final safety monitoring and output filtering to the individual implementer 1.

Applications

Qwen 3 Next 80B Instruct is primarily utilized for tasks requiring the processing of extensive datasets and the execution of complex automated workflows. Due to its hybrid attention mechanism and sparse Mixture-of-Experts (MoE) architecture, the model is frequently applied in scenarios where computational efficiency is a priority, such as long-form document analysis and agent-based automation 13.

Alibaba states that the model is optimized for long-context tasks, natively supporting up to 256,000 tokens 1. This capacity makes it suitable for summarizing large document sets, technical manuals, and legal archives 13. In benchmarks cited by the developer, the model maintained high throughput for contexts exceeding 32,000 tokens, reportedly delivering 10 times the speed of dense 32B parameter models in similar scenarios 1. For specialized use cases requiring larger inputs, the model can be extended to handle up to 1 million tokens using the YaRN scaling method 1. However, the developer advises against applying static YaRN configurations for short-context tasks, as this can negatively affect performance 1.

In the field of agentic AI, Qwen 3 Next 80B Instruct is deployed within frameworks designed for autonomous tool utilization and coding assistance 13. The model supports tool-calling capabilities and can be integrated with the Model Context Protocol (MCP) to interact with external data sources and software environments 1. Alibaba recommends the use of the Qwen-Agent library to simplify the implementation of these agentic functions, which include complex reasoning and multi-step task execution 1.

The model is accessible through several cloud-based and local deployment channels. It is available via the Alibaba Cloud Model Studio and the NVIDIA API Catalog, where it is offered as an NVIDIA Inference Microservice (NIM) 1. For local or private enterprise deployment, the model weights are hosted on platforms such as Hugging Face and ModelScope 3. It is compatible with high-throughput inference engines including vLLM and SGLang, which facilitate the creation of OpenAI-compatible API endpoints 1. While the model possesses 80 billion total parameters, its low activation rate of approximately 3 billion parameters allows it to be operated on consumer-grade hardware in specific configurations, as well as on enterprise-grade GPUs such as the NVIDIA A100, H100, and Blackwell series 13.

Reception & Impact

The release of Qwen 3 Next 80B Instruct has been noted for its focus on computational efficiency and its availability within the open-source ecosystem. Alibaba's decision to distribute the model weights through platforms such as Hugging Face and ModelScope is characterized by the developer as an effort to enable the research community to experiment with non-standard architectures 1. This move followed a trend of increasing interest in high-performance, open-weights models that can serve as alternatives to proprietary systems 3.

Architectural Influence

Qwen 3 Next 80B Instruct is recognized for its hybrid design, which deviates from the monolithic Transformer structures common in earlier large language models. By integrating Gated DeltaNet with standard gated attention in a 3:1 ratio, the model has influenced discussions regarding the viability of linear attention mechanisms for long-context tasks 1. Alibaba asserts that this hybrid approach addresses the 'recall' weaknesses associated with purely linear models while avoiding the quadratic complexity and high inference costs of standard attention 1. Industry researchers have highlighted this as a significant step toward 'inference-optimal' designs, where the goal is to maximize intelligence per activated parameter 13.

Efficiency and Economic Impact

The model's deployment has technical and economic implications for large-scale AI operations. According to developer documentation, the Qwen 3 Next 80B Instruct architecture allows for training and inference at a fraction of the cost of dense models with similar performance profiles 1. For example, the base version was reportedly trained using less than 10% of the GPU hours required for the dense Qwen3-32B model while achieving comparable or superior benchmark results 1. In inference scenarios, particularly those involving context lengths exceeding 32,000 tokens, the developer states that the model achieves more than 10 times the throughput of traditional architectures 1. This efficiency has been identified as a potential driver for the broader adoption of long-context applications, which were previously limited by high latency and hardware requirements 3.

Comparative Reception

Alibaba's internal evaluations position Qwen 3 Next 80B Instruct as a competitor to several closed-source and larger open-weights models. The developer claims that the 'Instruct' and 'Thinking' variants perform similarly to the significantly larger 235-billion-parameter Qwen3 flagship in complex reasoning and long-document processing 1. Furthermore, Alibaba asserts that the Thinking variant outperforms closed-source models such as Gemini-2.5-Flash-Thinking on multiple benchmarks 1. While these claims are based on developer-reported data, the model's ability to maintain performance with only 3.7% of its parameters activated during inference has made it a frequent subject of study in the move toward ultra-sparse Mixture-of-Experts (MoE) systems 1.

Version History

The development of the Qwen 3 Next 80B Instruct followed the standard Qwen 3 model series as an experimental branch focused on training and inference efficiency 1. The version history of the model is defined by its progression from a foundation base model to specialized post-trained variants, serving as a technical bridge toward the subsequent Qwen 3.5 architecture 1.

Initial Foundation Release

The series began with the release of the Qwen3-Next-80B-A3B-Base, a foundation model utilizing a highly sparse Mixture-of-Experts (MoE) architecture 1. Alibaba released this version primarily to demonstrate the efficiency of its hybrid attention mechanism, which combines Gated DeltaNet with standard Gated Attention 1. According to developer documentation, the base model achieved performance comparable to the dense Qwen3-32B while requiring less than 10% of the training compute cost, measured in GPU hours 1.

Specialized Variants

Following the base model, Alibaba released two post-trained versions to address different functional requirements:

  • Qwen3-Next-80B-A3B-Instruct: This version was optimized for general-purpose conversation and instruction following. Alibaba states that this variant performs on par with its 235-billion-parameter flagship, the Qwen3-235B-A22B-Instruct-2507, particularly in tasks involving context lengths up to 256K tokens 1.
  • Qwen3-Next-80B-A3B-Thinking: Released for advanced reasoning tasks, this version was designed for complex logic, mathematics, and STEM applications 19. The developer asserts that this variant outperforms the closed-source Gemini-2.5-Flash-Thinking on multiple benchmarks while maintaining a 3B active parameter footprint during inference 1.

Platform Integration and Future Evolution

Upon release, the Qwen 3 Next series was integrated into the Hugging Face transformers main branch and received native support from inference frameworks such as vLLM and SGLang 1. These updates enabled technical features like Multi-Token Prediction (MTP) for speculative decoding and support for the YaRN method to extend context windows up to 1 million tokens 1.

Alibaba has characterized the Qwen 3 Next architecture as the evolutionary precursor to the Qwen 3.5 series 1. In March 2026, the company announced the first model in that series, the Qwen3.5-397B-A17B, which scales the hybrid linear-attention and sparse MoE principles first implemented in the Qwen 3 Next 80B models 1112.

Sources

  1. 1
    Qwen3-Next: Towards Ultimate Training & Inference Efficiency. Retrieved March 25, 2026.

    Based on this new architecture, we train the Qwen3-Next-80B-A3B-Base model — an 80-billion-parameter model that activates only 3 billion parameters during inference. This base model achieves performance comparable to (or even slightly better than) the dense Qwen3-32B model, while using less than 10% of its training cost... The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507, and shows clear advantages in tasks requiring ultra-long context (up to 256K tokens).

  2. 2
    Qwen3-Next-80B-A3B-Instruct - Model Info, Parameters, Benchmarks - SiliconFlow". Retrieved March 25, 2026.

    Qwen3-Next-80B-A3B-Instruct is a next-generation foundation model released by Alibaba's Qwen team. It is built on the new Qwen3-Next architecture, designed for ultimate training and inference efficiency. Next-gen LLM with 1M context.

  3. 3
    Qwen3-Next: A New Generation of Ultra-Efficient Model Architecture Unveiled. Retrieved March 25, 2026.

    Alibaba has launched Qwen3-Next, a brand-new model architecture optimized for long-context understanding, large parameter scale, and unprecedented computational efficiency.

  4. 4
    What is Alibaba's Qwen3-Next-80B-A3B?. Retrieved March 25, 2026.

    On September 11th, 2025 they released a model with a weird name “Qwen3-Next-80B-A3B”. ... Two diagrams illustrating a hybrid architecture for Qwen3-Next-80B-A3B. The top diagram shows Gated Attention with components like Mixture of Experts, Zero-Centered RMSNorm.

  5. 5
    Qwen3 Max Thinking vs Qwen3 Next 80B A3B (Reasoning): Model Comparison. Retrieved March 25, 2026.

    Context Window: 262k tokens. Image Input Support: No. Open Source (Weights): Yes.

  6. 9
    Qwen3.5-Max-Preview Now Available on Arena. Retrieved March 25, 2026.

    We are pleased to announce the deployment of Qwen3.5-Max-Preview on Arena.

  7. 11
    Qwen3-Next-80B-A3B by Alibaba - AIPortalX. Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen3-Next-80B-A3B by Alibaba | AIPortalX","description":"","url":"https://aiportalx.com/alibaba/qwen3-next-80b-a3b","content":"# Alibaba | Qwen3-Next-80B-A3B - Capabilities, Benchmarks and Use Cases\n\n[AiPortalX![Image 1: AIPortalX Logo](https://aiportalx.com/aiportalxlogo.svg)](https://aiportalx.com/)\n\nToggle theme Open main menu\n\n[Models](https://aiportalx.com/models)[Tools](https://aiportalx.com/tools)[About](https://aiportalx.com/about)[Blog]

  8. 12
    Alibaba Unveils New Qwen3 Models for Coding, Complexing .... Retrieved March 25, 2026.

    {"code":200,"status":20000,"data":{"title":"Alibaba Unveils New Qwen3 Models for Coding, Complexing Reasoning and Machine Translation-Alibaba Group","description":"Alibaba has introduced a wave of new Qwen3 models for critical tasks includ","url":"https://www.alibabagroup.com/document-1886524500057522176","content":"# Alibaba Unveils New Qwen3 Models for Coding, Complexing Reasoning and Machine Translation-Alibaba Group\n\n[](https://www.alibabagroup.com/en-US)\n\n* [About Us](https://www.aliba

Production Credits

View full changelog
Research
gemini-2.5-flash-liteMarch 25, 2026
Written By
gemini-3-flash-previewMarch 25, 2026
Fact-Checked By
claude-haiku-4-5March 25, 2026
Reviewed By
pending reviewMarch 25, 2026
This page was last edited on March 26, 2026 · First published March 25, 2026