Alpha
amallo chat Icon
Wiki/Models/Devstral Small 2505
model

Devstral Small 2505

Devstral Small 2505 is a 24-billion parameter large language model (LLM) specifically optimized for agentic software engineering tasks 7. Released on May 21, 2025, the model was developed through a collaboration between the French artificial intelligence laboratory Mistral AI and All Hands AI 3, 7. It is a fine-tuned derivative of the Mistral-Small-3.1 architecture, designed to serve as the reasoning engine for autonomous coding agents rather than functioning solely as a general-purpose chat model 7. The system is intended to automate complex programming workflows, including codebase exploration, multi-file editing, and architectural analysis 7.

Technically, Devstral Small 2505 features a 131,072-token context window, enabling it to process large repositories and extensive documentation within a single inference cycle 7. It utilizes a custom "Tekken" tokenizer with a 131k vocabulary, which Mistral AI states improves efficiency in representing code structures 7. To facilitate local deployment on high-end consumer hardware, such as the NVIDIA RTX 4090 or Apple Silicon machines with at least 32GB of RAM, the developers removed the vision encoder present in the base model, making this version a text-only system 7. The model is distributed under the Apache 2.0 license, allowing for both commercial and non-commercial application without the restrictive terms often found in proprietary frontier models 3, 7.

In performance evaluations on the SWE-bench Verified leaderboard, a benchmark that requires models to resolve real-world software issues from GitHub, Devstral Small 2505 achieved a score of 46.8% 7. This result positioned it as one of the top-performing open-weight models at the time of its release 7. While larger models in the same family, such as Devstral 2 2512 and the API-only Devstral Medium, offer higher accuracy, the 24B parameter "Small" variant is significant for balancing high-tier reasoning capabilities with the ability to run on private, decentralized infrastructure 7.

The model's primary application is within agentic scaffolds such as OpenHands and Cline, where it is used to orchestrate changes across multiple files while maintaining context of framework dependencies 7. According to its developers, Devstral Small 2505 is optimized to handle tool-use and function-calling tasks specifically for software development environments, including detecting failures and attempting corrections during the execution of code changes 7. It is compatible with major inference frameworks, including vLLM, Transformers, and Ollama, supporting its integration into a wide range of developer toolchains 7.

Background

The development of Devstral Small 2505 occurred during a shift in the artificial intelligence industry toward "agentic" large language models 7. While previous iterations of coding assistants were primarily limited to code completion and isolated script generation, the emerging requirement for autonomous software engineering necessitated models capable of tool use, terminal execution, and multi-file repository manipulation 1, 7. This transition necessitated a move away from simple text prediction toward reasoning engines that could operate within complex scaffolds to solve real-world software issues 1.

Devstral Small 2505 was created through a technical partnership between Mistral AI and the open-source software platform All Hands AI 7. The collaboration leveraged Mistral AI's base model architecture and All Hands AI's expertise in agentic frameworks, specifically the OpenHands scaffold 1, 7. All Hands AI provided training data and the runtime environment used to optimize the model for iterative coding tasks 1. According to Mistral AI, the model is intended to function as the reasoning core for an autonomous agent rather than a standalone chat interface 1.

Technically, the model is a fine-tuned version of the Mistral-Small-3.1-24B-Base-2503 architecture 1. To optimize it for software engineering specifically, the developers removed the vision encoder present in the original Mistral-Small-3.1 model, resulting in a text-only system 1. A core motivation for the project was to offer a lightweight, open-weight alternative to proprietary flagship models such as GPT-4o or the Claude 3.5 series 1. By utilizing a 24-billion parameter dense transformer architecture, the developers aimed to provide performance competitive with larger models on benchmarks such as SWE-bench Verified while allowing the model to run on consumer-grade hardware like a single NVIDIA RTX 4090 or a Mac with 32GB of RAM 1, 7.

Devstral Small 2505 was officially released on May 21, 2025, under the Apache 2.0 license 7. It was the first model in the Devstral family, which later expanded to include the 123-billion parameter Devstral 2 and the API-only Devstral Medium 7. Its release was part of a broader effort to democratize high-capacity coding agents for local and private software development workflows 1.

Architecture

The architecture of Devstral Small 2505 is based on a dense transformer-based configuration, specifically serving as a fine-tuned derivative of the Mistral-Small-3.1 model family 1, 7. The model contains 24 billion parameters, a scale chosen by the developers to provide a balance between the high-level reasoning required for autonomous coding and the computational efficiency necessary for local inference 7. Unlike its predecessor architecture, Mistral-Small-3.1, which possesses multimodal capabilities, Devstral Small 2505 is a strictly text-only model 1. According to Mistral AI, the vision encoder was removed prior to fine-tuning to focus the model's representational capacity exclusively on text-based software engineering tasks and codebase exploration 1, 7.

Tokenization and Context Window

To handle the complexities of software development, Devstral Small 2505 utilizes the Tekken tokenizer 1. This tokenizer features a vocabulary size of 131,072 (131k) tokens, designed to optimize the representation of both natural language instructions and diverse programming syntaxes 1. The model architecture supports a 128,000 (128k) token context window 7. This expanded context is intended to allow the model to process large files and maintain an understanding of cross-file dependencies within a repository, which is a requirement for multi-file editing and architectural-level reasoning 7.

Functional Design and Tool Use

The model is architecturally optimized for "agentic" workflows, which involves the autonomous use of tools to solve multi-step problems 1. It supports both Mistral-style function calling and XML-formatted outputs, enabling it to interact with terminal environments, file systems, and external APIs 7. These capabilities are specifically tailored for integration into the OpenHands (formerly OpenDevin) scaffold, which acts as the execution environment for the model's commands 1. Mistral AI asserts that this architectural focus allows the model to track framework dependencies, detect failures, and attempt self-correction during the software development lifecycle 7.

Hardware Requirements and Compatibility

Devstral Small 2505 is designed for local deployment on consumer-grade and professional workstations 7. The developers state that the model's 24B parameter count is optimized to run on a single NVIDIA RTX 4090 GPU or Apple Silicon hardware with at least 32GB of unified memory 1. For deployment, the model is compatible with multiple inference frameworks, including vLLM, Hugging Face Transformers, Ollama, and LM Studio 7. It is released under the Apache 2.0 license, which facilitates its use in both proprietary and open-source development environments 1.

Capabilities & Limitations

Primary Functions

Devstral Small 2505 is specialized for autonomous software engineering tasks, a functional category often described by its developers as "agentic coding" 1. Unlike standard large language models (LLMs) that prioritize conversational response or isolated code snippet generation, this model is optimized for codebase exploration and multi-file editing 1, 7. Mistral AI and All Hands AI state that the model is designed to use external tools to navigate complex repositories, identify relevant logic across disparate files, and orchestrate architecture-level changes 1, 7.

A central capability of the model is its high-level reasoning within a 128,000-token context window, which allows it to process large portions of a technical project simultaneously 1, 7. It utilizes the Tekken tokenizer, which features a 131,000-word vocabulary, to manage various programming languages and technical syntax efficiently 1, 7.

Benchmarks and Environment

The model's capabilities are primarily demonstrated through its performance on the SWE-bench Verified benchmark, a standardized test for evaluating an AI's ability to resolve real-world software issues found on GitHub 1. When utilized within the OpenHands scaffold—a framework designed for autonomous agents—the model achieved a score of 46.8% 1, 7. Mistral AI reports that this result exceeded the performance of larger models, including GPT-4.1-mini and Claude 3.5 Haiku, when evaluated under similar test conditions 1.

Technical documentation emphasizes that Devstral Small 2505 is specifically tuned for integration with the OpenHands environment 7. While the model supports standard inference through frameworks such as vLLM, Transformers, and Ollama, its optimization for specific function-calling formats and XML output makes it most effective when operating as a component of an autonomous development agent 1.

Limitations and Modality

Despite being derived from the Mistral-Small-3.1 architecture, which possesses multimodal capabilities, Devstral Small 2505 is a text-only model 1, 7. The developers intentionally removed the vision encoder during the fine-tuning process to prioritize software engineering logic and reduce the model's overall computational footprint 1. As a result, the model cannot interpret visual data, such as screenshots of user interface bugs, architectural diagrams, or visual debugging logs 7. Users requiring visual analysis must rely on other models within the Mistral family, such as Pixtral 7.

Performance variability is a known factor when the model is used outside of its intended agentic workflows 7. While it functions as a general-purpose coding assistant, its specialized training for tool-use and repository manipulation may result in suboptimal results for tasks unrelated to software development, such as creative writing or general knowledge retrieval 1, 7. Furthermore, while the model is designed to be lightweight enough for local deployment on high-end consumer hardware—such as an RTX 4090 or a Mac with 32GB of RAM—performance in low-latency environments may degrade if the hardware does not meet these minimum specifications 7.

Intended Use

The model is intended for use by software developers and DevOps engineers to automate routine tasks, such as bug fixing, legacy system modernization, and the generation of unit tests 7. It is specifically licensed under the Apache 2.0 license, allowing for both commercial and non-commercial modification and deployment 1. Use cases that involve visual interpretation or extremely low-resource hardware are considered unintended, as the model's architecture has been specialized for text-based reasoning and moderate-to-high-end local inference 1, 7.

Performance

On the SWE-bench Verified metric, which evaluates the ability of models to resolve real-world software issues, Devstral Small 2505 achieved a score of 46.8% 1. Mistral AI and All Hands AI assert that this score represented a 6% improvement over the previous open-source performance record at the time of the model's release in May 2025 1. When evaluated using the OpenHands agentic scaffold, the model's performance exceeded that of several significantly larger dense and mixture-of-experts models, including DeepSeek-V3-0324 and Qwen3 232B-A22B 1.

Comparative Evaluations

In head-to-head comparisons provided by the developers, Devstral Small 2505 outperformed specific proprietary and open-weight models designed for similar coding tasks 1. Its 46.8% success rate on SWE-bench Verified was higher than the 23.6% reported for GPT-4.1-mini (using the OpenAI scaffold) and the 40.6% achieved by Claude 3.5 Haiku (using the Anthropic scaffold) 1. Additionally, the model surpassed the 40.2% score of SWE-smith-LM 32B, which utilized the SWE-agent scaffold 1.

Developer documentation indicates that Devstral’s efficiency in these tasks is largely attributed to its optimization for multi-file editing and codebase exploration within an agentic loop, rather than just isolated code completion 1, 7. The developers claim that the model's performance on these benchmarks validates the efficacy of fine-tuning smaller 24B parameter models for specialized reasoning tasks over using larger, general-purpose models 7.

Hardware Efficiency and Latency

A primary performance characteristic of Devstral Small 2505 is its operational efficiency on consumer-grade hardware 1. With 24 billion parameters, the model is designed to run on a single NVIDIA RTX 4090 GPU or an Apple Silicon Mac with 32GB of RAM 1, 7. This capability for local deployment is cited as a significant advantage for developers requiring private environments for proprietary codebases 7.

While specific latency figures for the fine-tuned Devstral variant vary by implementation, its parent architecture, Mistral-Small-3.1, is optimized for low-latency inference 7. Mistral AI states that the 24B parameter family can achieve speeds approximately three times faster than larger models, such as the 70B parameter Llama variants, when running on equivalent hardware 7. Usage data from API providers such as OpenRouter shows high-volume throughput, with millions of prompt tokens processed daily following the model's release 7.

Safety & Ethics

Alignment and Moderation

Devstral Small 2505 utilizes alignment techniques inherited from its base model, Mistral-Small-3.1, which includes supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF) to ensure outputs align with human values and safety standards 1, 2. For deployment environments requiring active content filtering, Mistral AI provides dedicated moderation models, such as the 3B-parameter mistral-moderation-2603, which can classify text across multiple policy categories including a specific category for detecting jailbreaking attempts 3. The developers state that users can also implement custom guardrails directly within API requests to enforce moderation rules without requiring separate classification calls 3.

Agentic Security and Code Execution

Because Devstral Small 2505 is designed for agentic workflows—where the model autonomously edits files and interacts with terminal environments—it presents distinct security risks compared to standard chat models 1, 5. Security researchers from NVIDIA note that when an AI system produces code for real-time execution, sanitization filters are often insufficient to prevent remote code execution (RCE) vulnerabilities 5. They emphasize that LLM-generated code must be treated as untrusted output and requires execution within a strictly isolated sandbox to prevent malicious payloads from impacting system-wide resources 5.

Independent analysis of autonomous agents identifies "excessive agency" as a primary concern, where a model may inadvertently perform unauthorized actions, such as deleting database entries or disabling security protocols, if granted broad permissions to interact with enterprise systems 7. To mitigate these risks, organizations are encouraged to implement granular access controls and human-in-the-loop (HITL) requirements for high-risk actions 7.

Known Vulnerabilities and Ethical Considerations

Like other code-specialized models, Devstral Small 2505 is susceptible to indirect prompt injection 6. In these scenarios, a threat actor may embed malicious instructions within a software repository's documentation or source files; when the model ingests these files to perform a task, it may be deceived into executing harmful commands 6, 7.

From an ethical and privacy perspective, Mistral AI asserts that the model's 24-billion parameter size and Apache 2.0 license facilitate safe usage for sensitive projects 1, 9. The model can be run locally on consumer-grade hardware with 32GB of RAM, allowing developers to keep proprietary codebases within private environments rather than exposing them to third-party APIs 1. This open-weights approach is also intended to allow the developer community to conduct independent safety audits and fine-tune the model for specific ethical constraints 1.

Applications

Devstral Small 2505 is primarily deployed as a reasoning engine within agentic software engineering frameworks, most notably the OpenHands framework 3. Mistral AI and All Hands AI state that the model is designed to automate complex software engineering workflows that require reasoning across multiple files and components, rather than performing isolated tasks like single-function generation 3.

Primary Use Cases

Real-world applications of the model center on autonomous repository-level modifications. According to Mistral AI, the model is intended for:

  • Automated Bug Fixing and Feature Implementation: The model is used to conduct iterative modifications across codebase repositories to resolve documented issues 3. In internal evaluations using the SWE-bench Verified benchmark, the model successfully resolved 46.8% of screened GitHub issues 3.
  • Codebase Exploration: Leveraging its 128,000-token context window, the model is applied to tasks requiring an understanding of project structures and inter-file dependencies 3.
  • Pull Request (PR) Automation: Within agentic scaffolds, the model can propose fixes and features with minimal human intervention, effectively acting as an autonomous contributor to software projects 3.

Deployment Scenarios

The model is distributed under the Apache 2.0 license, which allows for broad commercial and non-commercial application 3. It is frequently used in two distinct deployment environments:

  • Local Development and Data Privacy: Due to its 24-billion parameter size, the model can run on consumer-grade hardware, such as an NVIDIA RTX 4090 or Apple Silicon devices with at least 32GB of RAM 3. This makes it a candidate for developers and organizations that must maintain data privacy by avoiding cloud-based APIs for sensitive or proprietary codebases 3.
  • IDE Integration: The model has been integrated into modern development environments via third-party tools and extensions such as Cline, where it is used to perform multi-step coding tasks locally 3.

Constraints and Limitations

While optimized for software engineering, Devstral Small 2505 is not recommended for multimodal tasks. The vision encoder present in its base architecture (Mistral Small 3.1) was removed during fine-tuning to create a fully text-based model optimized specifically for code understanding 3. Consequently, it cannot process visual information such as UI screenshots or diagrams. Additionally, while the model outperforms some significantly larger models on coding benchmarks, its effectiveness is most pronounced when used within an agentic scaffold like OpenHands rather than as a standalone chat interface 3.

Reception & Impact

The release of Devstral Small 2505 was characterized by industry observers as a significant milestone in the development of open-source models for "agentic coding," a specialized subset of artificial intelligence focused on autonomous software engineering 3. Prior to its release, high-performance agentic capabilities—such as the ability to navigate complex codebases and perform multi-file edits—were largely associated with larger, proprietary models. The model's performance on the SWE-Bench Verified metric (46.8%) was presented by Mistral AI and All Hands AI as evidence that a 24-billion parameter model could effectively manage engineering tasks that previously required more computationally intensive systems 7.

The collaboration between Mistral AI and All Hands AI has been viewed by technical analysts as a validation of specialized training methodologies, particularly those designed to optimize models for specific software scaffolds 7. By fine-tuning the model specifically for integration into agentic frameworks like OpenHands, the developers demonstrated that architectural specialization could yield competitive results for software development compared to general-purpose instruction tuning 3. This development has contributed to an ongoing industry discussion regarding the shift from general-purpose large language models to a modular ecosystem of specialized engineering assistants 3.

From an economic and accessibility perspective, the model's release under the Apache 2.0 license has been noted for its potential impact on local development workflows 7. Because the 24B parameter scale allows for inference on high-end consumer hardware—including NVIDIA RTX 4090 GPUs and Apple Silicon machines with 32GB of RAM—the model offers a viable open-source alternative for organizations aiming to maintain data privacy or minimize the costs associated with proprietary API subscriptions 7. Media coverage highlighted that while Devstral Small 2505 is a text-only model without vision capabilities, its custom Tekken tokenizer and 128k context window allow it to process extensive repository-level data that was historically difficult to manage with open-weight models of this size 7.

Community adoption has been facilitated by the model's compatibility with standard inference frameworks such as vLLM, Transformers, and Ollama 7. Early usage data from platforms like OpenRouter indicated high initial demand, with millions of prompt tokens processed shortly after release, reflecting interest from developers seeking to integrate agentic reasoning into their existing CI/CD pipelines and coding environments 7.

Version History

Devstral Small 2505 was released in May 2025, marking the introduction of the 'Devstral' brand—a specialized line of models developed through a collaboration between Mistral AI and All Hands AI 1. The model was designed as an 'agentic' large language model (LLM), prioritizing tool use and multi-file codebase manipulation over general-purpose chat functions 1.

The model represents a direct evolutionary step from the Mistral-Small-3 series, specifically functioning as a fine-tuned derivative of Mistral-Small-3.1-24B-Base-2503 1. The '2505' designation follows Mistral AI's standard versioning convention, indicating the year (2025) and month (May) of its release 1. To optimize the model for software engineering agents, the developers removed the vision encoder present in the Mistral-Small-3.1 base architecture, rendering Devstral Small 2505 a text-only model 1.

Technical specifications for the 2505 version include a 128,000-token context window and the use of the Tekken tokenizer, which features a 131,000-word vocabulary size 1. Mistral AI released the model under the Apache 2.0 license, allowing for both commercial and non-commercial modification and usage 1. At the time of release, the developers recommended deploying the model via the OpenHands scaffold to leverage its agentic capabilities 1. While the initial release was focused on the open-source community, Mistral AI stated that subsequent commercial versions of the Devstral line would be released to provide specialized enterprise features, such as expanded context windows and domain-specific knowledge bases 1.

Sources

  1. 1
    Devstral Small 2505 - API Pricing & Providers - OpenRouter. Retrieved March 24, 2026.

    Devstral-Small-2505 is a 24B parameter agentic LLM fine-tuned from Mistral-Small-3.1, jointly developed by Mistral AI and All Hands AI for advanced software engineering tasks. Released May 21, 2025 131,072 context. achieving state-of-the-art results on SWE-Bench Verified (46.8%).

  2. 2
    Mistral Releases Devstral, an Open-Source LLM for Software Engineering Agents - InfoQ. Retrieved March 24, 2026.

    Mistral AI announced the release of Devstral, a new open-source large language model designed to improve the automation of software engineering workflows, particularly in complex coding environments.

  3. 3
    mistralai/Devstral-Small-2505 · Hugging Face. Retrieved March 24, 2026.

    Devstral is an agentic LLM for software engineering tasks built under a collaboration between [Mistral AI](https://mistral.ai/) and [All Hands AI](https://www.all-hands.dev/) 🙌... It is finetuned from [Mistral-Small-3.1](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Base-2503)... the vision encoder was removed... with its compact size of just 24 billion parameters, Devstral is light enough to run on a single RTX 4090 or a Mac with 32GB RAM.

  4. 5
    Moderation & Guardrailing | Mistral Docs. Retrieved March 24, 2026.

    Our moderation service is powered by Mistral Moderation models... mistral-moderation-2603 is a 3B model... has an updated set of policy categories, including a jailbreaking category.

  5. 6
    How Code Execution Drives Key Risks in Agentic AI Systems. Retrieved March 24, 2026.

    When an AI system produces code, there must be strict controls on how and where that code is executed. Without these boundaries, an attacker can craft inputs that trick the AI into generating malicious code... sandboxing is essential to contain its execution.

  6. 7
    The Risks of Code Assistant LLMs: Harmful Content, Misuse and Deception. Retrieved March 24, 2026.

    Issues like indirect prompt injection and model misuse are prevalent across platforms.

  7. 9
    Devstral Small 1.0 - Mistral AI. Retrieved March 24, 2026.

    A 24B text model, open source model that excels at using tools to explore codebases, editing multiple files and power software engineering agents.

Production Credits

View full changelog
Research
gemini-2.5-flash-liteMarch 24, 2026
Written By
gemini-3-flash-previewMarch 24, 2026
Fact-Checked By
claude-haiku-4-5March 24, 2026
Reviewed By
pending reviewMarch 25, 2026
This page was last edited on March 26, 2026 · First published March 25, 2026