Alpha
Wiki Icon
Wiki/Models/Qwen 3 Coder 480B
model

Qwen 3 Coder 480B

Qwen3-Coder-480B-A35B-Instruct is a large language model developed by Alibaba Cloud's Qwen team, released by May 2025 11229. The model utilizes a mixture-of-experts (MoE) architecture comprising 480 billion total parameters, of which 35 billion are active during any single inference cycle 11224. It was developed for tasks including code generation, agentic workflows, and tool-assisted interactions 131.

The model's architecture is a decoder-only transformer featuring 62 layers and Grouped Query Attention (GQA) 124. According to the developer, the model provides a native context window of 262,144 tokens, which can be extended to 1 million tokens through Yet Another Rope-based Network (YaRN) extrapolation 11222. This capacity is intended to facilitate repository-level understanding, allowing the model to analyze entire codebases or lengthy documentation 116. For agentic tasks, the model is tuned to prioritize direct responses and structured function calling 131.

Alibaba Cloud positions Qwen3-Coder as an open-weights alternative to proprietary models such as Anthropic’s Claude and Google’s Gemini series 316. According to Alibaba, the model demonstrates proficiency in agentic browser use and complex software engineering tasks 131. In evaluations on the Aider Polyglot benchmark, the model recorded a proficiency score of 61.8% across various programming languages 2829. Independent testing on technical benchmarks has indicated that the model performs at levels comparable to high-tier proprietary models in solving software engineering issues 31426.

Operationalizing the model requires substantial hardware resources, necessitating approximately 250GB of VRAM for full bfloat16 or FP8 inference 124. Deployment is also supported through 4-bit GGUF quantization, which reduces hardware requirements to approximately 80–120GB of VRAM 2830. The model is released under the NVIDIA Community Model License and the Apache 2.0 license 112. It is compatible with standard inference frameworks, including vLLM and Hugging Face Transformers 112.

Background

The development of Qwen3-Coder-480B-A35B-Instruct followed the Qwen2.5-Coder series, representing a shift toward increased scale in Alibaba Cloud's specialized open-weights models 212. The model was designed to address requirements for artificial intelligence capable of handling complex software engineering tasks and repository-level analysis 112. While previous iterations utilized dense architectures, the Qwen3-Coder 480B uses a Mixture-of-Experts (MoE) framework 222. It features a total parameter count of 480 billion, with 35 billion active parameters per inference cycle to manage computational costs and memory requirements 124.

According to the developers, the motivation for this architecture was rooted in an industry trend toward "agentic" large language models—systems that can autonomously use tools, call APIs, and maintain coherence across multi-turn interactions 212. To support these workflows, the model implements native support for a 256,000-token context window 122. Alibaba states this capacity allows the model to process entire codebases or extensive technical documentation, which the developer asserts is necessary for repository-level automation and collaborative coding 1216.

By mid-2025, the competitive landscape for open-source coding models was characterized by rapid iterations from various developers. Qwen3-Coder 480B entered a market where Kimi K2 and DeepSeek V3 were prominent alternatives 314. Third-party evaluations by 16x Eval indicated that while Qwen3-Coder 480B outperformed DeepSeek V3 on average coding ratings, it initially trailed Kimi K2 in areas such as user interface formatting and certain visualization tasks 3. Furthermore, evaluations noted that despite its scale, the model remained less competitive than some top-tier proprietary models in logical reasoning for uncommon programming patterns 3.

The training pipeline for the model involved four distinct stages: large-scale pretraining on 36 trillion tokens, reinforcement learning (RL) for reasoning, a "thinking mode" fusion to optimize for direct answers, and general RL across 20,000 parallel environments to refine tool-calling and code generation capabilities 212.

Architecture

The Qwen3-Coder-480B-A35B-Instruct model is built on a transformer-based, decoder-only mixture-of-experts (MoE) architecture 16. This design utilizes a sparse activation strategy, which allows the model to maintain a high total parameter count of 480 billion while only engaging a subset of 35 billion active parameters during any single forward pass 16. This approach is intended to provide the performance characteristics of a large-scale model while reducing the computational requirements and VRAM usage typically associated with dense architectures of similar scale 6.

Core Configuration and MoE Mechanism

The model's structural configuration consists of 62 transformer layers with a hidden dimension size of 6144 and an intermediate size of 8192 1. Within the MoE framework, the model incorporates a total of 160 specialized experts 1. A gating mechanism, implemented as a router network, dynamically selects 8 experts per token to process information during inference 16. The MoE intermediate size is specified at 2560 1. According to Alibaba Cloud, these specialized subnetworks allow the model to handle diverse tasks such as Python syntax, API integration, and debugging by routing inputs to the most relevant experts 6.

Attention Mechanism and Context Window

Qwen3-Coder employs Grouped Query Attention (GQA) to optimize memory efficiency and inference speed 1. The attention mechanism is configured with 96 query heads and 8 key-value (KV) heads, with a head dimension of 128 1. The model features a native context window of 262,144 tokens (approximately 256K) 1. This capacity is extendable to 1,000,000 tokens through the use of Yet Another RoPE-based Network (YaRN) extension techniques 16. To manage positional information across these long sequences, the model utilizes Rotary Position Embeddings (RoPE) 6. For efficient memory management during long-context processing, the developers indicate the use of optimizations such as FlashAttention-2 and likely sliding window attention 6.

Training Methodology and Data

The model was trained on a dataset comprising 36 trillion tokens, which includes a diverse mix of source code repositories, technical documentation, web content, and synthetic data 6. The training process followed a multi-stage pipeline, beginning with massive-scale pre-training to establish foundational knowledge 6. This was followed by a post-training phase that included supervised fine-tuning (SFT) using instruction-following data and reinforcement learning (RL) 16. The reinforcement learning stage was conducted across 20,000 parallel environments on Alibaba Cloud, focusing on approximately 20 specific tasks, including tool calling and code generation 6. The model uses a high-capacity tokenizer with a vocabulary size of 151,936 tokens 1.

Precision and Deployment Optimization

Qwen3-Coder-480B-A35B-Instruct is natively trained and deployed using FP8 (8-bit floating point) precision 16. This quantization format is specifically optimized for NVIDIA Hopper hardware architectures, such as the H100 GPU 1. The use of FP8 reduces the memory footprint for inference to approximately 250GB of VRAM, whereas a dense 480B model in bfloat16 would typically require over 900GB 6. The model supports deployment via several runtime engines, including vLLM and Hugging Face Transformers (version 4.51.0 and later), and is compatible with quantization formats like GGUF for further memory reduction on consumer-grade or mid-range hardware 16.

Capabilities & Limitations

Qwen3-Coder-480B-A35B-Instruct is primarily designed for autonomous software engineering and multi-step agentic workflows 1. The Qwen team describes the model as their most agentic to date, specifically optimized to function as an independent agent capable of environment interaction rather than a passive code completion tool 6. It provides native support for more than 90 programming languages and was trained on 7.5 trillion tokens, with code representing approximately 70% of the total dataset 6.

Agentic and Functional Capabilities

A central feature of the model is its integration with external tools and development environments. It features native support for function calling and specialized capability for browser-based automation, allowing it to interact with web interfaces and execute complex sequences of tasks 16. For professional software engineering, the model supports a native context window of 256,000 tokens, which can be extended to 1 million tokens using the YaRN (Yet Another Rope-based Network) method 1. This long-context capacity is intended to enable the analysis of entire code repositories and complex data structures, such as pull requests and multi-file projects 6.

In terms of performance, the developer states that the model achieves results comparable to proprietary systems like Claude 3.5 Sonnet in coding tasks 1. On the SWE-Bench Verified benchmark, which measures the ability to resolve real-world software issues, the model has demonstrated task resolution rates between 50% and 60% 6. Third-party evaluations have ranked the 480B variant competitively against other open-weight models such as Kimi K2 and DeepSeek V3 in core coding benchmarks 3.

Limitations and Failure Modes

Despite its high parameter count, Qwen3-Coder-480B-A35B-Instruct has documented limitations in specific domains. It is reported to struggle with complex visual UI formatting and may exhibit failure modes when tasked with uncommon or highly specific logical reasoning patterns, such as advanced TypeScript type narrowing 16.

Architectural trade-offs also impact its reasoning style; the model utilizes a "non-thinking mode" for inference, which prioritizes rapid, direct generation over verbose, step-by-step reasoning blocks 16. While this approach reduces latency, it may result in lower accuracy for tasks that require deep, multi-stage logical derivation compared to reasoning-optimized variants 6. Additionally, the model's physical scale presents a significant deployment barrier. Full-precision inference requires approximately 900GB of VRAM, though FP8 quantization can reduce this to approximately 250GB, still necessitating high-end multi-GPU configurations 16.

Intended Use Cases

The model is intended for use as a backend for professional developer tools, autonomous coding assistants, and large-scale repository analysis engines 1. It is released under terms that allow for commercial use, including the Apache 2.0 and NVIDIA Community Model licenses, making it a candidate for enterprise-level automation and custom tool integration 1.

Performance

Qwen3-Coder-480B-A35B-Instruct demonstrates competitive performance within the open-weights model landscape, particularly in standardized coding benchmarks. In evaluations conducted by 16x Eval, the model reached a performance level exceeding DeepSeek V3 and, as of late August 2025, was ranked higher than Kimi K2 for specific coding tasks 3. On medium-difficulty coding challenges, such as markdown cleaning, Qwen3 Coder achieved a rating of 9.25/10, matching the performance levels of proprietary models like Claude 4 Opus 3.

Despite its strengths in common coding scenarios, the model remains less competitive against leading proprietary systems such as the Claude 4 and GPT-4 series 3. In specialized logical reasoning tests involving uncommon programming patterns, such as advanced TypeScript narrowing, Qwen3 Coder scored 1/10, failing to produce code that passed compiler checks 3. This result was consistent with other open models like Kimi K2 and proprietary models like Gemini 2.5 Pro, whereas Claude 3.5 Sonnet successfully resolved the same issue with a score of 8/10 3. Additionally, the model has been noted for lower precision in instruction-following for specific constraints, such as "output only diff" tasks, where it tended toward more verbose responses 3.

In complex visual and user interface (UI) tasks, Qwen3 Coder's performance is lower than that of several primary competitors. In benchmark visualization experiments requiring the generation of graphical charts, the model scored 7/10 3. Third-party analysis indicated that while the model produced functional horizontal bar visualizations, it suffered from formatting issues and less nuanced aesthetic choices compared to higher-scoring outputs from Kimi K2, Gemini 2.5 Pro, and Claude 3.5 Sonnet 3.

The model's performance is enabled by its Mixture-of-Experts (MoE) architecture, which attempts to balance computational load with high parameter capacity. Although the model contains 480 billion total parameters, only 35 billion are active during any single inference forward pass 35. This sparse activation strategy is intended to provide the intelligence associated with large-scale models while maintaining inference efficiency more typical of mid-sized models 5. Artificial Analysis notes that the model supports a context window of 262,144 tokens, facilitating long-context reasoning for repository-scale code analysis 5.

Safety & Ethics

Safety and ethics for Qwen3-Coder-480B-A35B-Instruct are addressed through a combination of post-training alignment, specialized guardrail models, and specific licensing restrictions. Alibaba Cloud utilizes supervised fine-tuning (SFT) with instruction-following data to align the model's responses with user intent and safety requirements 1. This process includes training the model for structured function calling and tool choice, which is intended to reduce the risk of erratic behavior during complex agentic workflows 1.

Alignment and Guardrails

Alibaba Cloud introduced Qwen3Guard, a family of safety models designed to provide real-time moderation for the Qwen3 series 7. Qwen3Guard is available in specialized variants, including Qwen3Guard-Stream, which performs token-level moderation during response generation to maintain low latency 7. These models employ a three-tier classification system—categorizing content as Safe, Unsafe, or Controversial—allowing developers to adjust the strictness of safety policies according to specific use cases 7. According to the developer, this tiered approach is more flexible than traditional binary labeling 7.

Red-Teaming and Vulnerabilities

Independent evaluations have highlighted both the safety strengths and the potential risks associated with the model's large scale and agentic capabilities. Research conducted by Huawei on the safety alignment of modern language models identified the Qwen3 series, particularly variants utilizing integrated reasoning and self-reflection mechanisms, as demonstrating robust safety alignment 8. However, researchers also noted that post-training and knowledge distillation can sometimes lead to a degradation in safety performance if not treated as a core optimization objective 8.

External red-teaming projects have identified several specific risk vectors for the 480B model:

  • Agentic Surface Area: Because the model can call tools, browse the web, and execute multi-step code, it presents a larger attack surface for prompt injection and tool misuse 9.
  • Long-Context Exploits: The model's support for up to 1 million tokens via YaRN creates a risk where malicious instructions or vulnerabilities could be hidden deep within a massive code repository or prompt chain 9.
  • Vulnerability Replication: Like many code-focused models trained on public repositories, there is a risk of the model reproducing insecure coding patterns or vulnerable scripts found in its training data 10.

Licensing and Data Ethics

Use of the Qwen3-Coder-480B-A35B-Instruct model is governed by the NVIDIA Community Model License, with certain components falling under Apache 2.0 15. While the model is available for commercial use, NVIDIA and Alibaba Cloud state that developers are responsible for ensuring the model meets industry-specific requirements and for reporting discovered security vulnerabilities 1. Regarding data ethics, the model was trained on a diverse dataset of code repositories and documentation 1. Independent legal observers have noted that the unauthorized use of copyrighted works to train generative models remains a subject of significant legal scrutiny, with the U.S. Copyright Office suggesting that model guardrails to prevent infringing outputs may support fair-use arguments in some contexts 1112.

Applications

Qwen3-Coder-480B-A35B-Instruct is primarily applied in autonomous software engineering and multi-step agentic workflows 1. The model is designed to function as an independent agent capable of environment interaction rather than a passive code completion tool 6.

Software Development Environments

The model is optimized for integration into Integrated Development Environments (IDEs) and AI-native coding platforms such as Cursor and VS Code 6. Within these environments, it serves as a backend for advanced code completions, automated refactoring, and multi-file editing 1. By utilizing its custom function-calling capabilities, the model can interface with extensions like CLINE and Qwen Code to execute terminal commands, manage file systems, and perform complex debugging cycles 6. Its ability to maintain context across multi-turn interactions allows it to assist in iterative software development processes where a developer provides feedback on generated snippets 16.

Autonomous Agent Frameworks

Alibaba Cloud states that the model's architecture is specifically tuned for "agentic" tasks, where it acts as an autonomous participant in engineering workflows 68. It is deployed within agent frameworks to handle complex repository-level tasks, such as those evaluated in the SWE-Bench Verified benchmark, which includes bug fixing and feature implementation across distributed files 6. The model's native context window of 262,144 tokens—extendable to 1 million tokens using the Yet Another Rope-based Network (YaRN) technique—enables it to perform large-scale codebase analysis 1. This is particularly utilized in enterprise environments for technical debt reduction, where the model must ingest and understand an entire repository to suggest architectural improvements or identify deprecated code patterns 16.

Web and Tool Automation

Beyond standard programming, the model is utilized for "agentic browser-use" scenarios, which include automated software testing and web-based research 1. This application involves the model generating structured JSON outputs to interact with browsers or external APIs 6. For example, it can be tasked with navigating a web application to identify UI bugs or verifying that a deployed API endpoint functions correctly under various conditions 1.

Ideal and Non-Recommended Scenarios

Qwen3-Coder is ideally suited for high-throughput environments and automated pipelines where rapid code generation is required 6. For full performance, the developer recommends deployment on high-end hardware such as NVIDIA Hopper-based systems with at least 250GB of VRAM 1. However, because the model operates in a "non-thinking mode" that prioritizes direct answers, the developer suggests it may be less suitable for tasks requiring granular, step-by-step logical reasoning compared to smaller models specifically optimized with "thinking" modes 6.

Reception & Impact

Qwen3-Coder-480B-A35B-Instruct has been identified as a significant development in the trajectory of open-weights artificial intelligence, specifically regarding the scaling of specialized coding models 6. Industry observers have characterized the release as a milestone for the open-source community, providing a high-parameter Mixture-of-Experts (MoE) alternative to leading proprietary systems 6.

In terms of comparative performance, the model has been evaluated against the Claude and GPT series. According to developer documentation, Qwen3-Coder-480B achieves results comparable to Claude 3.5 Sonnet on foundational coding tasks and agentic workflows 1. On the SWE-Bench Verified benchmark, which measures the ability to resolve real-world software issues, the model solved between 50% and 60% of tasks 6. This performance level has led to community adoption of the model as a backend for AI-native coding environments and automated software engineering tools 6.

The economic implications of the model center on the potential for reduced costs in agentic automation. By supporting repository-scale context—up to 1 million tokens using YaRN—the model enables the automation of complex software engineering cycles that were previously manually intensive 16. This "agentic" capability, which includes browser use and tool integration, is intended to transition the role of AI from passive code completion to active environment interaction 16. Such capabilities are expected to influence developer productivity by streamlining code review, documentation generation, and the analysis of large-scale codebases 1.

Hardware requirements for the 480B model have prompted discussion regarding the practicalities of local deployment. The model is officially supported on the NVIDIA Hopper platform, and full inference using bfloat16 precision requires approximately 250GB of VRAM 16. This necessitates high-end multi-GPU configurations, typically involving H100 or A100 hardware 6. While quantization techniques such as FP8 and 4-bit GGUF can reduce memory requirements to between 80GB and 120GB of VRAM, the model remains primarily targeted at enterprise environments and cloud-based API providers rather than standard consumer hardware 16.

Version History

Qwen3-Coder-480B-A35B-Instruct was first announced by the Qwen team on July 22, 2025, with the official release of version 1.0 following on August 22, 2025 16. The model introduced the 'A35B' nomenclature to the Qwen series, a designation used to specify that while the model contains 480 billion total parameters, only 35 billion are activated during any single forward pass due to its sparse Mixture-of-Experts (MoE) architecture 16. This release preceded the launch of the broader Qwen3 Max model by approximately two months 4.

A central feature of the v1.0 release is the optimization for 'non-thinking mode' 1. The developer states that the 480B variant is designed to bypass the generation of internal reasoning steps (often contained within <think> blocks) to prioritize rapid, direct output in coding environments 1[Hugging Face]. For applications requiring explicit step-by-step logical reasoning, the Qwen team recommended the use of specialized 'thinking mode' variants like the Qwen3-235B-A22B 6.

Updates following the initial release have focused on improving 'agentic' capabilities, specifically regarding tool-use and multi-turn interaction 6. The model was refined to support a custom function-calling format compatible with third-party agentic platforms such as CLINE and Qwen Code [Hugging Face]. These updates also solidified the model's long-context support, which allows for a native context window of 262,144 tokens, extendable to 1 million tokens through the application of YaRN (Yet Another Rope-based Network) 16.

From a software integration perspective, the transition to the Qwen3 architecture required updates to the Hugging Face transformers library. The model necessitates version 4.51.0 or higher to properly recognize the qwen3_moe configuration; users on older versions encounter KeyError exceptions during model loading [Hugging Face]. As of late 2025, the model is maintained through Alibaba Cloud’s DashScope and NVIDIA NIM, with subsequent support added for quantized formats including FP8 and GGUF to accommodate various hardware constraints 16.

Sources

  1. 1
    qwen3-coder-480b-a35b-instruct Model by Qwen | NVIDIA NIM. NVIDIA. Retrieved April 1, 2026.

    Release Date: 08/22/2025. Qwen3-Coder-480B-A35B-Instruct is a state-of-the-art large language model specifically designed for code generation and agentic coding tasks. It is a mixture-of-experts (MoE) model with 480B total parameters and 35B activated parameters, featuring native support for 262,144 tokens context length and extendable up to 1M tokens using YaRN.

  2. 2
    Gummadi, Sai Dheeraj. (July 23, 2025). Qwen3-Coder-480B-A35B-Instruct: Open-Source Agentic Coding with Unprecedented Scale and Power. Medium. Retrieved April 1, 2026.

    This 480-billion-parameter Mixture-of-Experts (MoE) model, with 35 billion active parameters, sets a new benchmark in agentic coding, tool use, and long-context understanding. Designed to rival top-tier models like Claude Sonnet 4. ... Native Context: Supports 256K tokens... Yarn Extension: Uses Yet Another Rope-based Network (Yarn) to extend context to 1M tokens.

  3. 3
    (August 28, 2025). Qwen3 Coder Performance Evaluation: A Comparative Analysis Against Leading Models. 16x Engineering. Retrieved April 1, 2026.

    In this post, we take a look at how Qwen3 Coder performs on core coding benchmarks, comparing it with other top open and proprietary models, including Kimi K2, DeepSeek V3 (New), Gemini 2.5 Pro, and Claude Sonnet 4.

  4. 4
    Qwen3-Coder: Agentic coding in the world. Hacker News. Retrieved April 1, 2026.

    Discussion regarding the release and scaling of Alibaba's open-weights coding series.

  5. 5
    GLM-4.7 (Reasoning) vs Qwen3 Coder 480B A35B Instruct: Model Comparison. Artificial Analysis. Retrieved April 1, 2026.

    Qwen3 Coder 480B A35B Instruct... Context Window: 262k tokens... Parameters: 480B, 35B active at inference time... Release Date: July, 2025.

  6. 6
    Qwen3Guard: Real-time Safety for Your Token Stream. Retrieved April 1, 2026.

    We are excited to introduce Qwen3Guard, the first safety guardrail model in the Qwen family... Qwen3Guard-Stream, which marks a significant departure from previously open-sourced guard models by enabling efficient, real-time streaming safety detection.

  7. 7
    What Matters For Safety Alignment?. Retrieved April 1, 2026.

    We identify the LRMs GPT-OSS-20B, Qwen3-Next-80B-A3B-Thinking, and GPT-OSS-120B as the top-three safest models, which substantiates the significant advantage of integrated reasoning and self-reflection mechanisms for robust safety alignment.

  8. 8
    Promptfoo x Qwen3-Coder: Unmasking Vulnerabilities in 480 Billion Parameters. Retrieved April 1, 2026.

    Qwen3-Coder supports 358 coding languages, natively understands repositories up to 256k tokens... that’s a ton of power—so it’s critical to systematically red team this model for vulnerabilities, biases, and edge-case exploits.

  9. 9
    RedCoder: Automated Multi-Turn Red Teaming for Code LLMs. Retrieved April 1, 2026.

    Existing red-teaming approaches... show that these models are prone to generating vulnerable or even malicious code under adversarial settings.

  10. 10
    Copyright and Artificial Intelligence, Part 3: Generative AI Training Pre-Publication Version. Retrieved April 1, 2026.

    The Office is releasing this pre-publication version... technical background regarding data characteristics and training phases.

  11. 11
    Copyright Office Weighs In on AI Training and Fair Use | Skadden, Arps, Slate, Meagher & Flom LLP. Retrieved April 1, 2026.

    The use of guardrails to prevent or minimize the creation of infringing outputs... weighs in favor of a fair-use argument.

  12. 12
    Qwen/Qwen3-Coder-480B-A35B-Instruct · Hugging Face. Hugging Face. Retrieved April 1, 2026.

    introducing its most powerful variant first: Qwen3-Coder-480B-A35B-Instruct ... This model supports only non-thinking mode ... With transformers<4.51.0, you will encounter the following error: KeyError: 'qwen3_moe'.

  13. 14
    Qwen3 Coder vs. Kimi K2 vs. Sonnet 4 Coding Comparison (Tested .... Retrieved April 1, 2026.

    {"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/LocalLLaMA/comments/1mi8lbl/qwen3_coder_vs_kimi_k2_vs_sonnet_4_coding/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](http

  14. 16
    What the heck is "Qwen3-Coder-480B-A35B-Instruct .... Retrieved April 1, 2026.

    {"code":200,"status":20000,"data":{"title":"Lee Robinson on X: \"What the heck is \"Qwen3-Coder-480B-A35B-Instruct\"?\n\nIt sounds like a series of random numbers and letters. But it’s not too different from \"MacBook Pro M3 Max 16-inch 64GB\".\n\nEach part is telling us something important about the model, like the size, capabilities, and purpose.\" / X","description":"","url":"https://x.com/leerob/status/1947787896607367539","content":"## Post\n\n## Conversation\n\nWhat the heck is \"Qwen3-Cod

  15. 22
    Qwen3 Coder 480B A35B - ApX Machine Learning. Retrieved April 1, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen3 Coder 480B A35B: Specifications and GPU VRAM Requirements","description":"","url":"https://apxml.com/models/qwen3-coder-480b-a35b","content":"### Technical Specifications\n\nTotal Expert Parameters\n\n35.0B\n\nNumber of Experts\n\n160\n\nActive Experts\n\n8\n\nAttention Structure\n\nMulti-Head Attention\n\nHidden Dimension Size\n\n6144\n\nNumber of Layers\n\n62\n\nAttention Heads\n\n96\n\nKey-Value Heads\n\n8\n\nActivation Function\n\nSwigLU\n\nN

  16. 24
    We tested Qwen3-Coder, GPT-5 and other 30+ models on new SWE .... Retrieved April 1, 2026.

    {"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/LocalLLaMA/comments/1moakv3/we_tested_qwen3coder_gpt5_and_other_30_models_on/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticke

  17. 26
    unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF · UD-Q4_K_XL .... Retrieved April 1, 2026.

    {"code":200,"status":20000,"data":{"title":"unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF · UD-Q4_K_XL matches bf16 with 60.9% vs 61.8% on Aider Polyglot benchmark","description":"UD-Q4_K_XL :","url":"https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF/discussions/8","content":"UD-Q4_K_XL :\n\n```\ntest_cases: 225\n model: openai/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF\n edit_format: diff\n commit_hash: f38200c\n pass_rate_1: 29.8\n pass_rate_2: 60.9 << -- this is the fina

  18. 28
    Qwen3-Coder: How to Run Locally | Unsloth Documentation. Retrieved April 1, 2026.

    {"code":200,"status":20000,"data":{"title":"Qwen3-Coder: How to Run Locally | Unsloth Documentation","description":"Run Qwen3-Coder-30B-A3B-Instruct and 480B-A35B locally with Unsloth Dynamic quants.","url":"https://unsloth.ai/docs/models/tutorials/qwen3-coder-how-to-run-locally","content":"# Qwen3-Coder: How to Run Locally | Unsloth Documentation\n\n[Introducing Unsloth Studio: a new web UI for local AI 🦥](https://unsloth.ai/docs/new/studio)\n\n[![Image 1: Logo](https://unsloth.ai/docs/~gitboo

  19. 29
    qwen3-coder-480b-a35b-instruct Model by Qwen - NVIDIA NIM APIs. Retrieved April 1, 2026.

    {"code":200,"status":20000,"data":{"title":"qwen3-coder-480b-a35b-instruct Model by Qwen | NVIDIA NIM","description":"Excels in agentic coding and browser use and supports 256K context, delivering top results.","url":"https://build.nvidia.com/qwen/qwen3-coder-480b-a35b-instruct","content":"# qwen3-coder-480b-a35b-instruct Model by Qwen | NVIDIA NIM\n\n[![Image 1: NVIDIA](https://build.nvidia.com/_next/image?url=%2Fnvidia-logo.png&w=600&q=75)](https://build.nvidia.com/)\n\n[Explore](https://build

Production Credits

View full changelog
Research
gemini-2.5-flash-liteApril 1, 2026
Written By
gemini-3-flash-previewApril 1, 2026
Fact-Checked By
claude-haiku-4-5April 1, 2026
Reviewed By
pending reviewApril 1, 2026
This page was last edited on April 1, 2026 · First published April 1, 2026