Mistral Small 3.2 24B Instruct
Mistral Small 3.2 24B Instruct

Mistral Small 3.2 24B Instruct is an open-weight multimodal large language model released by Mistral AI on June 20, 2025 1. As a minor update to the 3.1 version released earlier that year, the model contains approximately 23.6 billion parameters and is distributed under the Apache 2.0 license, which permits broad commercial and research application 13. The model is designed for instruction following and is trained on data with a knowledge cutoff of October 2023 1. It occupies a specific niche in the "small" model class, typically defined as models ranging between 4B and 40B parameters, where it aims to balance computational efficiency with high-level reasoning capabilities 3.
Unlike many of its predecessors in the Small series, version 3.2 is a multimodal model, meaning it can process both text and image inputs while generating text as output 13. According to technical specifications, the model supports a context window of 128,000 tokens, allowing it to process the equivalent of approximately 192 pages of standard text in a single prompt 3. Its visual capabilities have been evaluated across several document-centric benchmarks; the developer reported scores of 0.95 on DocVQA for document image understanding and 0.93 on AI2D for diagram reasoning 1. Additionally, the model achieved a score of 0.87 on ChartQA, indicating a specialized aptitude for interpreting visual data representations 1.
In terms of performance, Mistral Small 3.2 24B Instruct has demonstrated significant proficiency in coding and precise instruction following. It reached the top rank on the HumanEval Plus benchmark with a score of 0.93 and secured a first-place ranking on the IF (Instruction Following) benchmark with a score of 0.85 1. On general knowledge and reasoning tests, the model achieved an 81% score on the Massive Multitask Language Understanding (MMLU) benchmark and 69.1% on the more challenging MMLU-Pro 1. However, on highly specialized scientific reasoning tasks like the GPQA Diamond, its performance was more moderate, with a reported score of 46.1% 1.
Independent analysis by Artificial Analysis ranked the model 13th out of 53 comparable models in its intelligence index at the time of evaluation 3. The model is noted for its high inference speed, reaching approximately 191.1 output tokens per second, which categorizes it as one of the faster models in its parameter class 3. While its input pricing is competitive at $0.10 per million tokens, its output pricing of $0.30 per million tokens has been characterized as somewhat expensive compared to other open-weight non-reasoning models of similar size 3. Despite its strong benchmark performance, the developer has already directed users toward Mistral Small 4 as a subsequent primary option for those seeking the latest advancements in this model family 3.
Background
The development of Mistral Small 3.2 24B Instruct followed a rapid iteration cycle within Mistral AI's "Small" model category during the first half of 2025 7. This model represents an evolution from the Mistral Small 3.1 release, identified by the version number 2503, to the 3.2 update released in June 2025 under the version identifier 2506 7. This three-month interval between versions reflects the developer's focus on refining instruction-following capabilities and expanding the functional scope of mid-sized models to keep pace with industry shifts toward integrated multimodality 7.
A primary motivation for the 24-billion-parameter architecture was the strategic positioning of the model for use on consumer-grade and mid-range enterprise hardware 7. While larger models typically require multi-GPU clusters for inference, the 24B size is designed to fit within the VRAM constraints of prosumer hardware, such as 24GB or 48GB GPU configurations, without requiring extreme quantization that might degrade performance 7. Mistral AI positioned this as a solution for organizations requiring a balance between the high reasoning capabilities of large models and the lower operational costs and latency associated with smaller architectures 7.
At the time of its release in June 2025, the state of the artificial intelligence field was characterized by a shift from text-only models to native multimodality in smaller parameter classes 7. Mistral Small 3.2 24B Instruct was built to address this trend, incorporating vision capabilities that allow it to process and analyze images and documents alongside text-based prompts 7. This functionality was integrated with advanced tool-use and function-calling features, aimed at enabling the model to serve as an agentic controller within broader software ecosystems 7. The model's training data includes a knowledge cutoff of October 2023, and it maintains a 128,000-token context window, allowing for the processing of lengthy documents or complex multi-turn interactions 7.
Architecture
Mistral Small 3.2 24B Instruct is a dense, transformer-based large language model (LLM) containing approximately 23.6 billion parameters 1. Unlike mixture-of-experts (MoE) architectures that activate only a specific subset of parameters per token, this model utilizes its full parameter count during inference, placing it within the "Small" size class (typically defined as 4B to 40B parameters) 3. The model is identified by the version string "2506," which designates it as a minor update to the version 3.1 (2503) architecture released earlier in 2025 1.
Multimodal Integration
A core architectural feature of Mistral Small 3.2 is its native multimodality, which allows the model to process both text and images as input 13. This is achieved through integrated vision processing that enables the interpretation of visual data alongside textual prompts. The architecture is specifically designed to handle complex visual tasks such as Visual Question Answering (VQA), document image analysis, and chart interpretation 1. In benchmark evaluations, this multimodal approach resulted in a score of 0.95/1 on DocVQA and 0.87/1 on ChartQA 1. Despite the ability to process visual inputs, the model's output modality is restricted to text 3.
Context and Inference Performance
The model features a context window of 128,000 tokens, which allows for the ingestion of approximately 192 pages of standard A4 text in a single prompt 3. This high context capacity is intended to support Retrieval-Augmented Generation (RAG) and the analysis of long-form documentation or codebases 3.
Architecturally, the model is optimized for high-speed, low-latency inference relative to larger frontier models 1. Performance analysis by Artificial Analysis measured the model's output speed at 191.1 tokens per second, characterizing it as notably fast compared to other open-weight models of similar scale 3. The model is distributed under the Apache 2.0 license, which facilitates its deployment in local environments where low-latency response times are a priority 1.
Training and Knowledge Base
The training of Mistral Small 3.2 24B Instruct was completed with a knowledge cutoff of October 2023 1. The "Instruct" designation signifies that the model underwent supervised fine-tuning to enhance its adherence to complex user prompts and multi-step instructions 1. Training objectives included strengthening the model's proficiency in reasoning-intensive domains such as mathematics and programming. On the HumanEval Plus benchmark, which tests the functional correctness of synthesized code, the model achieved a score of 0.93/1 1.
According to Mistral AI, the architecture also emphasizes "conciseness" in its responses; evaluation data indicates the model generated 4.5 million tokens on the Artificial Analysis Intelligence Index, which is lower than the average of 5.3 million tokens for models in its category, suggesting an architectural preference for brevity 3.
Capabilities & Limitations
Mistral Small 3.2 24B Instruct is a multimodal model designed to process both text and visual inputs while generating text-based outputs 13. It features a 128,000-token context window, allowing it to maintain coherence over extensive documents or long-form conversations 38. Mistral AI asserts that the model is optimized for high-speed response times and low-latency functional tasks, such as tool use and structured data generation 5.
Text and Coding Proficiency
The model demonstrates significant proficiency in technical domains, particularly software engineering and mathematics. At the time of its release, it achieved a rank of #1 on the HumanEval Plus benchmark with a Pass@5 score of 0.93/1, indicating a high degree of functional correctness in synthesized code 1. According to developer benchmarks, the 3.2 version improved its MBPP Plus Pass@5 score to 78.33% from the 74.63% observed in the previous 3.1 iteration 7. In mathematical reasoning, the model scored 0.69 on the MATH benchmark and 0.81 on the MMLU, though its performance on more rigorous reasoning tasks like MMLU-Pro (0.69) shows a decline compared to standard MMLU evaluations 17.
Instruction Following and Agentic Capabilities
A primary design goal for the 3.2 update was the refinement of instruction adherence and the reduction of output errors. Mistral AI reports that the model achieved an instruction-following accuracy of 84.78%, up from 82.75% in the version 3.1 8. This improvement is also reflected in public benchmarks such as Wildbench, where the model's score increased from 55.6% to 65.33% 7. The model is characterized by its developer as "agent-centric," featuring native support for function calling and JSON-formatted outputs 5. Third-party testing by n8n confirms strong performance in structured output and classification tasks, though it notes that the model's tool-use efficiency may rank lower than specialized proprietary models 2.
Visual Understanding
As a multimodal model, Mistral Small 3.2 can reason over visual data, including charts and complex documents. It achieves a score of 0.95/1 on the DocVQA benchmark and 0.87/1 on ChartQA 1. These capabilities allow it to perform visual question answering and document analysis, such as extracting data from tables or interpreting graphical information 28. According to AWS, these visual-textual reasoning capabilities make the model suitable for tasks such as document understanding and image-grounded content generation 8.
Limitations and Failure Modes
Despite its performance within the 24B parameter class, Mistral Small 3.2 has inherent limitations. Its "logic" and "reasoning" scores in third-party benchmarks are lower than those of significantly larger models like GPT-5 or Mistral Large 29. Specifically, it ranks #152 on the GPQA benchmark, a dataset designed to test PhD-level expertise in hard sciences, indicating a ceiling on its deep domain reasoning 1.
The model is also subject to "procedural hallucinations," a failure mode where a model computes a correct value in its chain-of-thought but fails to report it accurately in the final answer 10. Research indicates such errors may stem from "readout-stage routing failures" rather than a lack of information 10. Furthermore, Mistral AI notes that version 3.2, while improved, still experienced "infinite generations" or repetitive loops in 1.29% of tested cases 78. Its knowledge is limited by a cutoff date of October 2023, meaning it cannot natively reference events or developments occurring after that time 1.
Performance
Mistral Small 3.2 24B Instruct is positioned by third-party analysts as a leading model in intelligence within the small-to-medium parameter class (4B to 40B parameters) 3. It achieves an Intelligence Index score of 15, which is above the average of 12 for comparable models 3. The model's performance is characterized by high scores in specialized reasoning and coding benchmarks, though it ranks lower on extremely difficult scientific reasoning tasks 1.
Language and Knowledge Benchmarks
In standard language understanding evaluations, the model achieves a 5-shot MMLU (Massive Multitask Language Understanding) score of 80.5%, ranking 57th globally across all model sizes 1. On the more rigorous MMLU-Pro benchmark—which expands multiple-choice options and focuses on reasoning-intensive tasks—the model scores 69.1%, ranking 67th 1. For scientific reasoning, Mistral Small 3.2 24B Instruct reaches 46.1% accuracy on the GPQA (Graduate-Level Google-Proof Q&A) dataset, which utilizes domain experts to curate questions in biology, physics, and chemistry 1. On the Arena Hard-Auto benchmark, which uses LLM-as-a-judge methodology to approximate human preference on real-world prompts, the model is ranked 21st 1.
Specialized Task Performance
The model demonstrates high proficiency in instruction following and software engineering tasks. It is ranked first in its size class for the HumanEval Plus benchmark, which tests the functional correctness of synthesized code, with a score of 0.93/1 1. Similarly, it holds a top rank in instruction following (IF) with a score of 0.85/1 1. In mathematical reasoning, the model ranks 39th on the MATH dataset, a collection of challenging competition problems 1. For visual mathematics, it ranks 18th on MathVista, which assesses the ability to perform reasoning within visual contexts 1.
Multimodal Capabilities
Self-reported data from the model provider indicates strong performance on document-based visual tasks. Mistral Small 3.2 24B Instruct achieves a score of 0.95/1 on DocVQA, a dataset for visual question answering on document images, and 0.93/1 on AI2D 1. On the ChartQA benchmark, which requires interpreting data from visual charts, the model scores 0.87/1 1.
Inference Speed and Efficiency
Artificial Analysis characterizes the model as "notably fast," recording an output speed of 191.1 tokens per second (tps) 3. This throughput ranks it 5th out of 53 models in its specific size and weight class 3. The model is also described as concise, generating 4.5 million tokens during evaluation on the Intelligence Index compared to an average of 5.3 million for similar models 3.
Cost-Effectiveness
As of June 2025, the model's pricing is $0.10 per 1 million input tokens and $0.30 per 1 million output tokens 3. While the input price is considered average for its class, third-party analysis suggests the output pricing is somewhat expensive compared to an average of $0.20 for similar open-weight models 3. Based on a blended 3:1 input-to-output ratio, evaluating the model's intelligence across a standard suite of benchmarks cost approximately $21.97 3.
Safety & Ethics
Mistral Small 3.2 24B Instruct incorporates several layers of safety and alignment protocols to manage risks associated with large-scale language and vision processing 13. To transition the model from its base pre-trained state to an instruction-following assistant, Mistral AI employed Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) 37. DPO, in particular, is utilized to refine the model's responses by directly optimizing against a dataset of human preferences, which reduces the likelihood of the model generating toxic, biased, or unhelpful content 3.
As a multimodal model, safety filtering is applied to both textual and visual modalities 35. Mistral AI states that the model is designed to refuse requests that violate safety policies, such as instructions to generate illegal content or sexually explicit material 17. The inclusion of a vision encoder introduces additional safety vectors; the system is trained to identify and mitigate risks associated with visual prompts, though independent researchers note that multimodal models generally face complex "jailbreaking" challenges where visual and text prompts are combined to bypass standard filters 38.
The model's distribution under the Apache 2.0 license is cited by analysts as a significant factor in its ethical profile 1. Because the weights are open, the model can be audited by third-party security researchers and academic institutions to identify latent biases or vulnerabilities that might remain hidden in proprietary, closed-source models 13. Furthermore, the open-weight nature allows organizations to implement custom safety fine-tuning, enabling them to refine the model for specific industry regulations or internal compliance standards 37. This transparency is intended to facilitate better understanding of model behavior compared to "black box" alternatives 1.
Despite these measures, the model retains inherent limitations typical of transformer architectures. It has a knowledge cutoff of October 2023, which may lead to factual inaccuracies when queried about more recent events 1. Mistral AI recommends that developers implement additional system-level safeguards, such as external moderation APIs or retrieval-augmented generation (RAG), to ensure the accuracy and safety of the model's output in production environments 57.
Applications
Mistral Small 3.2 24B Instruct is positioned for applications that require a balance between computational efficiency and high-level reasoning, specifically in sectors where data privacy or low-latency processing is a priority 15. Its multimodal capabilities and parameter size allow it to serve in roles ranging from automated document processing to local coding assistance 12.
Document and Visual Analysis
The model is frequently applied to automated document and chart analysis within the financial and legal sectors 12. Its proficiency in these areas is supported by its performance on specialized benchmarks; it achieved a score of 0.95/1 on DocVQA (Document Visual Question Answering) and 0.87/1 on ChartQA 1. These scores indicate an ability to accurately extract information from document images, complex forms, and statistical graphics 12. These capabilities enable the automation of high-volume data verification and the synthesis of information from lengthy reports or transcripts, aided by its 131,072-token context window 2.
Edge Deployment and Coding Assistance
With approximately 23.6 billion parameters, the model is designed for local or edge deployment on hardware that cannot support larger 70B+ parameter models 15. This makes it a candidate for privacy-focused coding assistants and internal software development tools 5. The model's coding performance, evidenced by a 0.93/1 score on the HumanEval Plus benchmark, allows it to be integrated into development environments for real-time code generation and debugging without requiring data to be transmitted to external cloud providers 1. Mistral AI states that the model is specifically optimized for low-latency inference, which is a requirement for interactive programming applications 5.
Agentic Workflows and Domain Specialization
Mistral Small 3.2 24B Instruct serves as a base for fine-tuning specialized "subject matter experts" in niche domains 5. According to the developer, the model is "agent-centric," featuring native support for function calling and structured JSON output 5. These features allow it to act as the reasoning engine for AI agents that must interact with external APIs, databases, or software tools to complete multi-step tasks 25. Third-party analysis indicates that the model delivers significant gains in tool use and structured output tasks compared to its predecessor, making it suitable for complex workflow automation in IT and security operations 2.
Reception & Impact
The reception of Mistral Small 3.2 24B Instruct has centered on its positioning as a versatile, open-weight alternative to proprietary mid-tier models. Industry analysts have identified the model as a leading performer within the 4B to 40B parameter class, noting that its Intelligence Index score of 15 exceeds the category average of 12 3. This performance-to-size ratio has led to its characterization as a significant competitor to closed-source API offerings, particularly for enterprises seeking to balance high-level reasoning with the cost-efficiency of self-hosting 78.
A primary factor in the model's positive reception is its accessibility for local hardware deployments. The 24-billion-parameter architecture allows it to run on consumer-grade GPUs with sufficient VRAM, providing a middle ground between smaller 7B-8B models and larger 70B+ counterparts 3. This has made the model a popular choice within the open-source community for developing local multimodal agents and document processing workflows that require high-context windows and vision capabilities without the latency or privacy concerns of cloud-based APIs 17.
Technically, the model's refinements over its predecessor (version 2503) have been noted by developers as improving utility in production environments. According to Mistral AI's internal benchmarks, the 3.2 version (2506) improved instruction-following accuracy to 84.78% and reduced repetitive generation errors by approximately 50%, falling from 2.11% to 1.29% 18. These incremental improvements have been cited by third-party services as essential for high-reliability enterprise applications, such as automated customer support and complex tool-calling 7.
The model's integration into major cloud platforms shortly after release, including Amazon Bedrock and SageMaker JumpStart, has been viewed as an indicator of its commercial viability 8. AWS highlighted the model's suitability for document understanding and visual Q&A, positioning it as a robust tool for secure, VPC-controlled enterprise environments 8. Furthermore, its distribution under the Apache 2.0 license has been received as a favorable move for commercial adoption, as it removes the restrictive licensing hurdles often found in other "open" but commercially limited models 13.
However, community reception has also identified technical challenges. Early adopters reported issues with visual input processing in specific serving frameworks such as vLLM, where bugs initially prevented the consistent use of the model's multimodal features 11. While the model is praised for its performance in common reasoning tasks, it has been observed to rank lower on highly specialized, difficult scientific reasoning benchmarks compared to larger-scale models, suggesting it is best suited for generalist or instruction-heavy applications rather than niche scientific research 1.
Version History
Mistral AI's 24-billion-parameter "Small" model series has undergone iterative updates to refine its instruction-following and multimodal capabilities. The series utilizes a versioning system where the four-digit suffix indicates the specific iteration, such as 2503 or 2506 1.
Mistral Small 3.1 (Version 2503)
Released in early 2025, Mistral Small 3.1 (internal version 2503) served as the initial model in the 24B parameter series. It established the base architecture and parameter count that would be refined in subsequent minor updates while providing the framework for the model's 128,000-token context window.
Mistral Small 3.2 (Version 2506)
On June 20, 2025, Mistral AI released Mistral Small 3.2, identified by the version number 2506 1. This release was categorized as a minor iterative update to the preceding 3.1 (2503) model 1. While the underlying parameter count remained approximately 23.6 billion, the update focused on improving the model's reliability in automated workflows and complex instruction-following tasks 12.
According to technical evaluations by n8n, version 3.2 introduced several functional refinements:
- Instruction Following: Significant improvements were noted in WildBench and Arena Hard benchmark scores compared to the 3.1 version 2.
- Repetition Reduction: The update addressed issues with "infinite generations," a failure state where a model repeats text without terminating the response 2.
- Task Specialization: The developer implemented optimizations for structured output tasks and tool use, specifically improving function calling accuracy 2.
Mistral Small 3.2 maintained the multimodal capabilities of its predecessor, supporting both text and image inputs while providing text-based outputs 12. The Apache 2.0 open-weight license and the October 2023 knowledge cutoff date remained consistent with the previous version 1.
Sources
- 1“Mistral Small 3.2 24B Instruct: Pricing, Benchmarks & Performance”. Retrieved March 24, 2026.
Mistral-Small-3.2-24B-Instruct-2506 is a minor update of Mistral-Small-3.1-24B-Instruct-2503. Parameters 23.6B. Released Jun 2025. License Apache 2.0. Multimodal model that can process both text and images as input. Benchmarks: HumanEval Plus 0.93, IF 0.85, MMLU 81%.
- 2“Mistral Small 3.2 - Intelligence, Performance & Price Analysis”. Retrieved March 24, 2026.
Mistral Small 3.2 is an open weights model released June 2025. Total parameters 24B. License Apache 2.0. Input modality supports text, image. Context window 128k. Output tokens per second 191.1. Ranked #13 / 53 on Artificial Analysis Intelligence Index.
- 3“Mistral: Mistral Small 3.2 24B Review — Pricing, Benchmarks & Capabilities (2026) — Design for Online”. Retrieved March 24, 2026.
Mistral Small 3.2 24B by mistralai. 128K context, from $0.0750/1M tokens, vision, tool use, function calling. Evolution from Mistral Small 3.1 to 3.2 (versions 2503 to 2506). Knowledge cutoff of October 2023.
- 5“Mistral Small 3.2 - Mistral AI”. Retrieved March 24, 2026.
Context 128k... Modalities... Chat Completions... Function Calling... Structured Outputs... OCR
- 7“mistralai/Mistral-Small-3.2-24B-Instruct-2506 · Hugging Face”. Retrieved March 24, 2026.
Wildbench v2... Small 3.2 24B Instruct 65.33%... Infinite Generations... Small 3.2 24B Instruct 1.29%... MBPP Plus - Pass@5... 78.33%... MMLU Pro (5-shot CoT)... 69.06%
- 8“Mistral-Small-3.2-24B-Instruct-2506 is now available on Amazon Bedrock Marketplace and Amazon SageMaker JumpStart”. Retrieved March 24, 2026.
Improves in following precise instructions with 84.78% accuracy compared to 82.75% in version 3.1... now includes image-text-to-text capabilities... ideal for tasks such as document understanding, visual Q&A, and image-grounded content generation.
- 9“Mistral Small 3.2 24B Instruct vs Phi 4 Reasoning: Complete Comparison”. Retrieved March 24, 2026.
Mistral Small 3.2 24B Instruct outperforms in 0 benchmarks, while Phi 4 Reasoning is better at 3 benchmarks (Arena Hard, GPQA, MMLU-Pro).
- 10“Causal Explanations for Procedural Hallucinations”. Retrieved March 24, 2026.
We study this as procedural hallucination: failure to execute a verifiable, prompt-grounded specification even when the correct value is present in-context... errors are readout-stage routing failures.
- 11“[Bug]: Mistrall Small 3.2 doesn't work with images”. Retrieved March 24, 2026.
Your current environment vLLM 0.9.1 (Docker) mistralai/Mistral-Small-3.2-24B-Instruct-2506 Using mistralai/Mistral-Small-3.1-24B-Instruct-2503 with everything else unchanged works. 🐛 Describe the bug
