GLM-5
GLM-5 is a large language model (LLM) developed by Zhipu AI (also known as Z.ai) in collaboration with researchers from Tsinghua University 13. Released on February 11, 2026, the model serves as the successor to the GLM-4 series and represents a strategic shift in the developer's methodology toward what they describe as "agentic engineering" 10, 13. This paradigm focuses on transitioning AI from conversational assistants into autonomous agents capable of planning, executing, and self-correcting across multi-hour, complex workflows 10. GLM-5 is notable for being released as an open-weight model under the MIT license, distinguishing it from several contemporary frontier models that remain proprietary and closed-source 10.
The model utilizes a Mixture-of-Experts (MoE) architecture with 744 billion total parameters, of which approximately 40 billion are active during any single token generation 10. This architecture is a substantial expansion from its predecessor, GLM-4.5, which utilized 355 billion total and 32 billion active parameters 10. GLM-5 was pre-trained on a dataset comprising 28.5 trillion tokens, an increase from the 23 trillion used for the previous version 13. To improve computational efficiency during both training and inference, the model integrates DeepSeek Sparse Attention (DSA) and utilizes a 200,000-token context window 10, 13. Furthermore, Zhipu AI introduced a novel asynchronous reinforcement learning infrastructure named "Slime," which is intended to increase training throughput by decoupling model generation from training updates 13.
In technical evaluations, GLM-5 has demonstrated performance that its developers state is comparable to leading proprietary models such as Claude 4.5 and GPT-5.2 13. According to the GLM-5 research team, the model achieved a score of 50 on the Artificial Analysis Intelligence Index v4.0, asserting it is the first open-weight model to reach this specific benchmark threshold 13. On software engineering tasks, the model recorded a score of 77.8% on the SWE-bench Verified leaderboard and 56.2% on Terminal-Bench 2.0 10. Independent analysis of the model's agentic capabilities noted strong performance on benchmarks such as MCP-Atlas for tool invocation and Vending Bench 2, where the model manages a simulated business over a long-term horizon 10, 13.
The release of GLM-5 is a significant event in the Chinese and global artificial intelligence sectors, occurring shortly after Zhipu AI became the first publicly traded foundation model company following its Hong Kong IPO in January 2026 10. The model is positioned as a cost-effective alternative for enterprise and engineering applications, with API pricing for input and output tokens significantly lower than Western closed-source competitors 10. While available through proprietary cloud platforms such as BigModel.cn, the open-weight release allows for self-hosting on high-specification hardware, typically requiring a minimum of eight H200 GPUs for 8-bit inference, which provides a path for organizations with strict data sovereignty requirements 10.
Background
The development of GLM-5 followed a series of iterations within the General Language Model (GLM) framework, primarily led by Zhipu AI (formerly Z.ai) in collaboration with researchers from Tsinghua University 13. Prior to the release of GLM-5, the developer maintained the GLM-4 series, which included incremental updates such as GLM-4.5, 4.6, and 4.7 17. In January 2026, Zhipu AI became a publicly traded entity via an IPO in Hong Kong, marking a transition from a research-focused startup to a commercial foundation model provider 10.
The release of GLM-5 on February 11, 2026, occurred during a period of intense competition within the Chinese domestic artificial intelligence sector, a period often characterized as the "War of a Hundred Models" 10. This environment saw numerous Chinese technology firms and research labs racing to establish domestic alternatives to prominent international models such as OpenAI’s GPT-4o and Anthropic’s Claude 3.5 and 4.5 10, 17. By early 2026, the industry focus shifted from simple conversational interaction toward "agentic engineering," which involves the development of models capable of long-horizon planning, autonomous tool usage, and self-correction during complex workflows 10, 17. Zhipu AI positioned GLM-5 specifically to address this shift, stating that the model was designed to transition from "writing code" to "building entire projects" 17.
Technically, the motivation for GLM-5 involved a significant scaling of both architecture and training data compared to its predecessors. The model expanded the total parameter scale from the 355 billion (32 billion activated) used in the GLM-4 era to 744 billion total parameters, with 40 billion activated per token 17. The developer increased the pre-training dataset from 23 trillion to 28.5 trillion tokens, asserting that larger-scale computing power was necessary to improve general intelligence 17. To manage the computational demands and post-training efficiency of this larger scale, the development team introduced a new framework called "Slime" to support asynchronous reinforcement learning 17. Additionally, GLM-5 integrated DeepSeek Sparse Attention for the first time, which Zhipu AI claims allows the model to maintain long-text performance while reducing deployment costs and improving token efficiency 17.
Architecture
GLM-5 is built upon a Mixture-of-Experts (MoE) transformer architecture, a design choice intended to balance high parameter capacity with computational efficiency during inference 13. The model contains a total of 744 billion parameters, of which 40 billion are active during any single forward pass 13. This represents a significant scale-up from its predecessor, GLM-4.5, which featured 355 billion total and 32 billion active parameters 13. To manage communication overhead in expert parallelism, the developers reduced the layer count to 80 and increased the number of experts to 256 13.
Attention Mechanisms and Innovations
A central architectural feature of GLM-5 is the implementation of DeepSeek Sparse Attention (DSA) 13. Unlike standard dense attention mechanisms that scale quadratically with sequence length, DSA employs a dynamic selection mechanism that identifies and prioritizes important tokens 13. Zhipu AI states that this approach reduces attention computation by approximately 1.5 to 2 times for long sequences without degrading reasoning depth 13. The model was transitioned to DSA through a continued pre-training strategy involving a "dense warm-up" followed by sparse adaptation, rather than training from scratch 13.
GLM-5 also utilizes Multi-latent Attention (MLA), a technique designed to reduce GPU memory usage and accelerate processing for long-context sequences 13. The developers introduced a variation called "MLA-256," which increases the head dimension to 256 while decreasing the number of attention heads to optimize decoding performance across diverse hardware 13. Additionally, a method termed "Muon Split" was applied to the model's optimizer to ensure that projection weights for different attention heads update at independent scales, which the researchers found stabilized attention logits during pre-training 13.
Training Methodology and Infrastructure
The training of GLM-5 involved a total budget of 28.5 trillion tokens, divided into three primary stages: pre-training, mid-training, and post-training 13.
- Pre-training: Focused on general language and coding capacity using a 27 trillion token corpus 13.
- Mid-training: Specifically targeted agentic and long-context capabilities, progressively extending the model's context window from 4,000 to 200,000 tokens 13.
- Post-training: Employed a sequential reinforcement learning (RL) pipeline consisting of Reasoning RL, Agentic RL, and General RL 13. To prevent catastrophic forgetting during these stages, the team used On-Policy Cross-Stage Distillation 13.
To improve speculative decoding, GLM-5 incorporates Multi-token Prediction (MTP) with parameter sharing 13. By sharing parameters across three MTP layers during training, the model maintains a consistent memory cost while increasing the acceptance rate of predicted tokens during inference 13. The model's infrastructure is also specialized for the Chinese GPU ecosystem, featuring optimizations for domestic chip platforms such as Huawei Ascend, Moore Threads, and Hygon 13.
Data Curation
The 28.5 trillion token training corpus was curated using advanced classification pipelines 13. For web data, the developers utilized a DCLM classifier based on sentence embeddings to identify high-quality content and a "World Knowledge" classifier to extract valuable information from medium-quality sources 13. The code dataset saw a 28% increase in unique tokens compared to previous versions, achieved through refreshed snapshots of code hosting platforms and the inclusion of more low-resource programming languages like Scala and Swift 13.
Capabilities & Limitations
GLM-5 is categorized as a flagship foundation model optimized for "agentic engineering," a term Zhipu AI uses to describe the transition of artificial intelligence from conversational assistance to autonomous task execution 17. The model's primary design focus is the completion of complex, long-horizon tasks requiring planning, tool invocation, and self-correction 17.
Reasoning and Logical Capabilities
GLM-5 incorporates a "Thinking Mode" designed to handle scenarios requiring extended cognitive processing 17. The developer states that this mode enables the model to perform deep reasoning, multi-step planning, and autonomous decision-making 17. In internal evaluations of long-range interactions, the model is described as capable of maintaining goal alignment over extended periods, managing intermediate resources, and resolving multi-step dependencies without losing narrative or logical coherence 17. On benchmarks measuring web-scale retrieval and information synthesis (BrowseComp), tool invocation (MCP-Atlas), and complex multi-tool orchestration (τ²-Bench), GLM-5 reportedly achieved the highest scores among open-weight models at the time of its release 17.
Coding and Mathematical Performance
In software engineering and data processing tasks, GLM-5 is positioned as a tool for "agentic coding," which involves generating runnable code and managing entire project lifecycles 17. According to Zhipu AI, the model's performance on the SWE-bench Verified benchmark reached a score of 77.8, while its performance on Terminal Bench 2.0 was 56.2 17. The developer asserts that these results exceed the performance of Gemini 3.0 Pro and approach the capabilities of Claude Opus 4.5 in real-world programming scenarios, particularly in areas such as backend refactoring and deep debugging 17. The model supports structured output in formats like JSON, intended to facilitate integration into existing developer workflows and enterprise data governance systems 17.
Multimodal Support and Context
While the primary GLM-5 model is documented as a text-to-text system, the broader GLM-5 ecosystem and its supporting architecture are designed to handle multimodal inputs and outputs 17. This includes integration with vision-language models (such as GLM-4.6V), audio processing (GLM-ASR), and video generation tools like CogVideoX and Vidu 17. The model features a 200,000-token context window for inputs and can generate up to 128,000 tokens in a single output 17. To maintain efficiency during long-context operations, GLM-5 utilizes a sparse attention mechanism and an intelligent context caching system designed to optimize token consumption and performance 17.
Identified Limitations and Failure Modes
Despite its performance in technical and engineering tasks, GLM-5 has identified limitations in specific domains. While the developer highlights its ability to generate high-quality scripts and storyboards with long-text consistency, independent assessments often find that such models may struggle with the nuanced stylistic requirements of complex creative writing or highly abstract literary tasks 17. Additionally, while the model is capable of professional translation and accurately converting formal texts between major languages, its performance in niche languages or low-resource dialects remains less verified 17.
In work scenarios involving ambiguous or complex objectives, the model's effectiveness depends on the clarity of its "Thinking Mode" application; failure to sustain goal alignment during extremely long horizons (beyond its optimized context) can lead to a degradation in instruction compliance 17. The model is intended for professional productivity and automated system management; unintended uses include tasks where high-stakes real-time physical safety is required without human-in-the-loop verification 17.
Performance
GLM-5 recorded a score of 50 on the Artificial Analysis Intelligence Index v4.0, which Zhipu AI identifies as the first instance of an open-weights model reaching this threshold 13. This represents an eight-point improvement over its predecessor, GLM-4.7, which scored 42 on the same index 13. According to the developer's technical report, GLM-5 is positioned as the top-performing open model on the LMArena Text and Code Arena leaderboards 13.
Standardized Benchmarks
In general language and reasoning evaluations, GLM-5 (using the MLA-256 architecture with Muon Split optimization) achieved a score of 62.0 on MMLU and 59.9 on C-Eval 13. Additional baseline results include a 77.4 on Hellaswag, 47.5 on GSM8K, and 36.6 on HumanEval 13. The model's performance on the RACE benchmark was recorded at 79.6, while it scored 51.3 on the Big Bench Hard (BBH) suite 13. Compared to the previous GLM-4.7 iteration, the developer reports an average performance gain of 20%, attributed largely to improvements in agentic capability and knowledge density 13.
Agentic and Coding Performance
Optimized for what the developer terms "agentic engineering," GLM-5 demonstrated a success rate of 77.8% on the SWE-bench Verified benchmark 13. In real-world coding and terminal environments, the model is reported to outperform Gemini 3 Pro and match the performance of proprietary models such as Claude Opus 4.5 and GPT-5.2 (xhigh) 13.
For long-horizon agentic tasks, GLM-5 was evaluated using Vending-Bench 2, a simulation requiring a model to manage a business over a one-year period 13. GLM-5 achieved a final account balance of $4,432, the highest among tested open-source models, which Zhipu AI states demonstrates advanced long-term planning and resource management 13. On the CC-Bench-V2 internal evaluation suite, the model showed gains across frontend, backend, and multi-step tasks compared to the GLM-4 series 13.
Architectural Efficiency and Context Fidelity
The implementation of DeepSeek Sparse Attention (DSA) is cited as a primary factor in reducing the computational requirements for training and inference 13. DSA reportedly reduces attention computation by a factor of 1.5 to 2.0 for long sequences, allowing the model to process 128K context windows at lower GPU costs 13. In retrieval-based testing via the RULER benchmark, GLM-5 maintained a score of 78.86 at a 128K context length 13.
Efficiency in speculative decoding was improved through the use of multi-token prediction (MTP) with parameter sharing 13. In developer tests, GLM-5 achieved an average speculative acceptance length of 2.76 tokens, exceeding the 2.55 tokens recorded for DeepSeek-V3.2 13. Furthermore, the model has been optimized for the Chinese GPU ecosystem, with full-stack adaptation across seven domestic hardware platforms, including Huawei Ascend and Moore Threads 13.
Safety & Ethics
GLM-5 utilizes a specialized post-training framework to align model behavior with human intent and safety standards. Zhipu AI employs an asynchronous reinforcement learning (RL) infrastructure termed "Slime" 3. According to the developer, this infrastructure is designed to improve training throughput and efficiency, facilitating more fine-grained iterations during the post-training phase to bridge the gap between basic model competence and high-level performance 3.
In comparison to contemporary frontier models like Anthropic's Claude Opus 4.6, GLM-5 has been characterized by independent observers as having limited publicly documented safety evaluations 10. However, because GLM-5 is released as an open-weight model under the MIT license, third-party researchers and the broader developer community are able to conduct independent audits and red-teaming exercises to identify potential vulnerabilities or biases 10. This transparency is noted as a primary benefit for organizations with strict compliance or data sovereignty requirements 10.
The agentic nature of GLM-5—specifically its capacity for long-horizon planning and autonomous tool invocation—introduces specific safety considerations. The model is capable of decomposing system-level requirements and maintaining context coherence over workflows lasting several hours 10. Zhipu AI states that the model includes capabilities for deep debugging and self-correction, which allow it to analyze logs and iteratively fix failures 10. Industry analysts have noted that such autonomous capabilities necessitate the implementation of robust guardrails, including human-in-the-loop checkpoints and rollback mechanisms to prevent unintended actions during production deployment 10.
As a model developed by a Beijing-based entity, GLM-5 is subject to Chinese regulatory oversight regarding generative artificial intelligence 10. While specific technical details of its internal content filtering mechanisms are not fully disclosed in public technical reports, the model is designed to operate within the regulatory frameworks of its primary market 10. Independent evaluations have highlighted that safety and alignment documentation varies significantly between providers, and potential users are advised to conduct internal assessments to ensure the model's outputs align with specific institutional ethics and security policies 10.
Applications
GLM-5 is designed for "agentic engineering," a paradigm shift from conversational assistance to the autonomous execution of multi-step, long-horizon tasks 3, 10. According to Zhipu AI, the model is intended to function as core infrastructure for complex systems engineering and persistent automated workflows 3, 13.
Software Development and Engineering
GLM-5 is frequently utilized in developer tools and autonomous coding agents. It achieved a score of 77.8% on the SWE-bench Verified leaderboard, which Zhipu AI identifies as the highest performance among open-source models as of early 2026 3, 10. The model is compatible with agentic frameworks such as Claude Code and OpenClaw, enabling it to perform end-to-end software engineering tasks including deep debugging, log analysis, and iterative self-correction 3, 13. A specialized proprietary variant, GLM-5-Turbo, is specifically optimized for "fast inference" within these agent-driven coding workflows 13.
Enterprise and Industry Verticals
In enterprise environments, GLM-5 is deployed for tasks requiring long-term planning and resource management. On the Vending Bench 2 benchmark—a simulation of a year-long vending machine business—GLM-5 demonstrated capabilities in sustained operational logic and financial planning, finishing with a final account balance of $4,432 3.
Third-party analysts suggest that GLM-5 represents a transition for enterprises from purchasing AI tools to building internal AI capabilities 1. Its open-weight MIT license is cited as a significant factor for adoption in sectors with stringent data residency and compliance requirements, such as finance, legal, and healthcare 1, 10. The ability to self-host the model allows these organizations to maintain data sovereignty while fine-tuning the model for specialized internal documentation or legal reasoning tasks 1, 10.
Ecosystem and Platform Integration
GLM-5 is integrated into the Zhipu AI (Z.ai) product suite through several channels:
- Z.ai and BigModel.cn: Accessible via a web-based chat interface for general productivity and through an enterprise API for developers 3.
- GLM Coding Plan: A subscription-based service providing three tiers of access (Lite, Pro, and Max) for use as a dedicated coding assistant 13.
- Model Repositories: Model weights are distributed through Hugging Face and ModelScope for research and self-hosted deployments 3.
Deployment Considerations
GLM-5 is recommended for organizations requiring model transparency, high-performance coding autonomy, and the ability to self-host to avoid vendor lock-in 10. However, it is not recommended for small-scale teams lacking substantial hardware resources; the developer states that self-hosting for FP8 inference requires a minimum of eight H200 GPUs 10. For tasks requiring the maximum possible reasoning depth or context windows exceeding 200K tokens, some benchmarks indicate that proprietary competitors like Claude Opus 4.6 may remain more suitable 10.
Reception & Impact
Industry reception of GLM-5 has focused on its role as a high-performing open-weight alternative to proprietary frontier models 10. In technical assessments conducted in early 2026, GLM-5 was characterized as a leading open-source competitor to U.S.-developed systems, specifically Anthropic's Claude 4.6 10. While proprietary models like Claude Opus 4.6 maintained higher benchmarks in reasoning and context window capacity, third-party evaluations noted that GLM-5 led all open-source models in software engineering tasks, recording a 77.8% score on SWE-bench Verified and 56.2% on Terminal-Bench 2.0 10.
Developer community adoption has been influenced by the model's release under the MIT license, which allows for self-hosting, fine-tuning, and model weight inspection 10. This transparency has made GLM-5 a preferred option for organizations with stringent data sovereignty and compliance requirements 10. However, independent analysts have observed that the model's substantial hardware requirements—a minimum of eight H200 GPUs for 8-bit floating-point (FP8) inference—create a barrier to entry for smaller developers, effectively limiting self-hosted adoption to well-resourced enterprises and research institutions 10.
The economic implications of GLM-5 are tied to its competitive pricing and the market positioning of Zhipu AI 10. Following the developer’s initial public offering (IPO) in Hong Kong in January 2026, the release of GLM-5 introduced a significantly lower price point for frontier-class intelligence 10. The model's API costs were reported to be approximately five times cheaper for input tokens and nearly eight times cheaper for output tokens compared to Claude Opus 4.6 10. This pricing strategy has been identified as a factor in the growth of the AI-native application market in China, as it allows startups to deploy agentic workflows with lower operational overhead than previously possible with Western proprietary models 10.
Societal and creative industry impacts have centered on the model's "agentic engineering" capabilities, which shift the user experience from basic conversational interaction toward autonomous task execution 10. While the model has been praised for its ability to manage multi-hour workflows and self-correct during debugging, some reviewers have expressed caution regarding the safety and alignment documentation provided by Zhipu AI compared to its Western counterparts 10. The open-weight nature of the model is frequently cited by the community as a mitigation factor, as it permits independent auditing and red-teaming 10.
Version History
GLM-5 was released on February 11, 2026, as the successor to the GLM-4 series, which had previously undergone incremental updates through versions 4.5 and 4.7 13. While previous iterations like GLM-4.5 utilized a standard Mixture-of-Experts (MoE) architecture with 355 billion total parameters, the GLM-5 base model was scaled to 744 billion total parameters, with 40 billion active during inference 3, 13. The base model was released under the MIT License, allowing for open-weights distribution via platforms such as Hugging Face and ModelScope 3.
In March 2026, Zhipu AI introduced GLM-5-Turbo, the company’s first proprietary, closed-source variant in the GLM-5 generation 10, 17. Developed internally under the codename "Pony-Alpha-2," the Turbo variant was designed to offer higher runtime efficiency and lower costs per API call than the base model 17. It was made available via the z.ai API and third-party providers such as OpenRouter on March 15, 2026, supporting a 202.8K-token context window and a 131.1K-token maximum output 10. Zhipu AI states that the Turbo variant is specifically optimized for "OpenClaw-style" tasks, including complex tool invocation and persistent automation 10.
Technical updates during the version transition included the integration of DeepSeek Sparse Attention (DSA) 13. The developer utilized a specialized model, GLM-4.7-Flash, to validate the DSA mechanism, which was subsequently implemented in GLM-5 to reduce the computational cost of managing 128K-token context sequences 13. For commercial deployment, Zhipu AI established a tiered subscription system—Lite, Pro, and Max—where access to the Turbo variant was initially phased, reaching Pro and Max subscribers in March 2026 and Lite subscribers in April 2026 10.
Sources
- 1Swain, Lalatendu Keshari. (March 20, 2026). “MiniMax M2.7 vs GLM-5 vs Claude Opus 4.6: The Definitive AI Model Showdown of March 2026”. Medium. Retrieved March 26, 2026.
GLM-5 was released on February 11, 2026 by Z.ai (formerly Zhipu AI)... GLM-5 is a Mixture-of-Experts (MoE) model with 744 billion total parameters and approximately 40 billion active parameters per token. ... Released under the MIT license and is available as an open-weight model.
- 3“GLM-5 - Overview - Z.AI DEVELOPER DOCUMENT”. Retrieved March 26, 2026.
Expanded Parameter Scale: Increased from 355B (32B activated) to 744B (40B activated), with pre-training data upgraded from 23T to 28.5T. ... Asynchronous Reinforcement Learning: A new “Slime” framework has been developed ... Sparse Attention Mechanism: DeepSeek Sparse Attention is integrated for the first time.
- 10“GLM-5 scores 50 on the Intelligence Index and is the new open ...”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/LocalLLaMA/comments/1r28xxz/glm5_scores_50_on_the_intelligence_index_and_is/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket
- 13“Z.ai's GLM-5 Model Boasts Top Open-Weights Intelligence Index ...”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"title":"Z.ai’s GLM-5 Model Boasts Top Open-Weights Intelligence Index Score","description":"Z.ai more than doubled the size of its flagship large language model to deliver outstanding performance among open-weights competitors.","url":"https://www.deeplearning.ai/the-batch/z-ais-glm-5-model-boasts-top-open-weights-intelligence-index-score/","content":"[Z.ai](http://z.ai/?utm_campaign=The%20Batch&utm_source=hs_email&utm_medium=email&_hsenc=p2ANqtz-_B_vfde-T1Z3F
- 17“Is GLM-5 assigning quantized models to high-usage users? : r/ZaiGLM”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/ZaiGLM/comments/1rki1v0/is_glm5_assigning_quantized_models_to_highusage/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](ht

