GPT-4o mini
GPT-4o mini

GPT-4o mini is a small-scale multimodal large language model developed by OpenAI, released on July 18, 2024 1219. Designed as a more efficient and cost-effective alternative to larger models, it was introduced to replace GPT-3.5 Turbo as the primary entry-level model in OpenAI's product lineup 12. The model is optimized for low latency and high-volume tasks, such as customer support chatbots, real-time text responses, and applications that require chaining multiple API calls 119. At launch, GPT-4o mini supported text and vision inputs through the API, with OpenAI stating that future updates would extend these capabilities to include audio and video 12.
In performance evaluations, OpenAI claims that GPT-4o mini outperforms other models in the "small" category across several standardized benchmarks 1. It achieved a score of 82.0% on the Massive Multitask Language Understanding (MMLU) benchmark, which measures general knowledge and reasoning 118. According to OpenAI's reported data, the model's MMLU performance surpassed that of Google’s Gemini 1.5 Flash and Anthropic’s Claude 3 Haiku 1. Independent testing by Artificial Analysis recorded a median output speed of 202 tokens per second, which was characterized as more than twice as fast as both GPT-4o and GPT-3.5 Turbo 518. Additionally, OpenAI reported that the model demonstrated proficiency in mathematical reasoning and coding, scoring 87.0% on the MGSM benchmark and 87.2% on HumanEval 118.
The pricing structure of GPT-4o mini is intended to make artificial intelligence integration more accessible 1. For developers using the API, the model is priced at 15 cents per million input tokens and 60 cents per million output tokens, which OpenAI states is over 60% cheaper than GPT-3.5 Turbo 125. The model features a 128,000-token context window—roughly equivalent to the length of a standard book—and supports up to 16,000 output tokens per request 1518. Its knowledge cutoff is dated to October 2023, and it utilizes the same tokenizer as the larger GPT-4o to improve the cost-efficiency of processing non-English text 118.
Safety and reliability measures for GPT-4o mini include the same mitigations used for GPT-4o, reinforced by reinforcement learning with human feedback (RLHF) and pre-training data filtering to remove hate speech and adult content 123. Evaluation of the model's safety has included automated benchmarks and human review 23. The model was made available immediately to ChatGPT Free, Plus, and Team users, with enterprise access following shortly after release, as part of a stated mission to broaden access to advanced reasoning capabilities 1219.
Background
The development of GPT-4o mini occurred during a broader industry shift toward smaller, more efficient large language models (LLMs) optimized for speed and cost-efficiency rather than raw parameter count 1. By mid-2024, developers and enterprises increasingly sought models capable of handling high-volume tasks—such as real-time customer support, large-scale data extraction, and complex API chaining—without the prohibitive latency and financial costs associated with flagship frontier models 13.
OpenAI developed GPT-4o mini as a successor to the GPT-3.5 architecture, which had previously served as the company's primary entry-level model 1. While GPT-3.5 Turbo was widely adopted, it lacked the multimodal capabilities and reasoning proficiency of the newer architectures released in the 2024 model cycle. OpenAI intended for the new model to offer a significant increase in affordability, pricing it at 15 cents per million input tokens, which represented a 60% cost reduction compared to GPT-3.5 Turbo 1. The model was designed as an offshoot of GPT-4o, the multimodal "omni" flagship that OpenAI had released two months earlier in May 2024 3.
The release took place within a competitive market characterized by high-speed, low-cost tiers from major AI developers. Competitors had already introduced specialized models for this segment, most notably Google with Gemini 1.5 Flash and Anthropic with Claude 3 Haiku 1. OpenAI positioned its model to compete directly with these offerings on performance and cost. According to OpenAI's internal evaluations, GPT-4o mini achieved a score of 82.0% on the Massive Multitask Language Understanding (MMLU) benchmark, compared to reported scores of 77.9% for Gemini Flash and 73.8% for Claude Haiku 1.
From a business perspective, the model was part of an effort to expand the adoption of the ChatGPT platform and its associated API. CNBC reported that OpenAI, valued at over $80 billion at the time, faced pressure to maintain its market lead and generate revenue to offset the immense costs of training and infrastructure 3. The strategy involved making artificial intelligence more accessible by lowering the barrier to entry for developers building and scaling applications 1. GPT-4o mini was officially released on July 18, 2024, and was integrated into the free and paid tiers of ChatGPT as a replacement for GPT-3.5 13.
Architecture
Model Structure and Scale
OpenAI describes GPT-4o mini as a small-scale model designed for high-volume, low-latency tasks 1. While the developer has not publicly disclosed the exact parameter count, industry analysts have categorized it alongside other compact models such as Meta's Llama 3 8B, Anthropic's Claude 3 Haiku, and Google's Gemini 1.5 Flash 2. The model is architected as a multimodal system that initially supports text and vision processing, with OpenAI stating that future iterations will include native support for audio and video inputs and outputs 1.
Context and Tokenization
The architecture features a context window of 128,000 tokens, which OpenAI equates to approximately the length of a standard book 12. The model is limited to a maximum of 16,000 output tokens per individual request 1. GPT-4o mini utilizes the same tokenizer as the larger GPT-4o flagship model 1. This shared tokenization strategy is intended to improve the efficiency of processing non-English text, thereby reducing the token count required for multilingual applications compared to the previous GPT-3.5 Turbo model 1.
Training Methodology and Knowledge Base
The training of GPT-4o mini followed a multi-stage process involving pre-training on large-scale datasets and subsequent alignment 1. During pre-training, OpenAI states that data was filtered to exclude hate speech, adult content, spam, and websites that primarily aggregate personal information 1. The model's knowledge cutoff is dated to October 2023 12.
Following the initial training phase, the model underwent post-training alignment using Reinforcement Learning with Human Feedback (RLHF) to refine its adherence to safety policies and improve response reliability 1. According to OpenAI, the model was also subjected to red-teaming by over 70 external experts in fields such as misinformation and social psychology to identify and mitigate potential risks 1.
Instruction Hierarchy and Security
A significant technical introduction in the GPT-4o mini architecture is the 'instruction hierarchy' method 1. This approach is designed to improve the model's resistance to adversarial attacks, including jailbreaks, prompt injections, and attempts to extract system prompts 1. By prioritizing specific layers of instructions, the method aims to ensure that the model remains compliant with its core safety and operational guidelines even when presented with conflicting or malicious user inputs 1. OpenAI identifies GPT-4o mini as the first model in its lineup to implement this specific technique in its API 1.
Performance Characteristics
Technical evaluations indicate that the model's architecture is optimized for high throughput. Independent testing by Artificial Analysis reported a median output speed of 202 tokens per second 2. This benchmark suggests the model is more than twice as fast as both GPT-4o and the predecessor GPT-3.5 Turbo 2. In terms of reasoning capabilities, OpenAI asserts that the model scores 82.0% on the Massive Multitask Language Understanding (MMLU) benchmark, which measures general knowledge and problem-solving abilities across various subjects 1.
Capabilities & Limitations
GPT-4o mini is a multimodal model capable of processing text and visual information, with OpenAI stating that support for audio and video inputs and outputs is planned for future iterations 1. The model features a 128K token context window and supports up to 16,000 output tokens per request 1. Its knowledge cutoff is documented as October 2023 15.
Technical Proficiencies and Benchmarks
According to OpenAI, GPT-4o mini was developed to provide high performance in reasoning, mathematics, and coding relative to other small-scale models. On the Massive Multitask Language Understanding (MMLU) benchmark, the model achieved a score of 82.0%, which OpenAI reports is higher than the performance of contemporary compact models such as Gemini Flash (77.9%) and Claude Haiku (73.8%) 16. In mathematical reasoning, the model scored 87.0% on the Multilingual Grade School Math (MGSM) benchmark, while its coding proficiency was measured at 87.2% on HumanEval 16. Multimodal capabilities were evaluated using the MMMU benchmark, where it scored 59.4% 1.
Independent evaluations by Artificial Analysis characterize GPT-4o mini as "concise" but "notably slow" compared to other non-reasoning models in its price class, recording an output speed of approximately 42 tokens per second 5. The same analysis placed the model below the average intelligence index for its category, scoring 13 against a class average of 15 5.
Intended Use Cases and Function Calling
The model is optimized for high-volume tasks and real-time applications. OpenAI identifies its primary intended uses as customer support chatbots, applications that parallelize multiple model calls, and tasks requiring the processing of large volumes of context, such as full codebases or long conversation histories 1. GPT-4o mini is noted for its performance in function calling and structured data extraction 1. Partner evaluations cited by the developer indicate that the model is more effective than GPT-3.5 Turbo at extracting data from files like receipts and generating email responses based on thread history 1.
Limitations and Failure Modes
Despite its proficiency in standardized benchmarks, GPT-4o mini exhibits limitations in reasoning depth compared to larger frontier models like the full-scale GPT-4o or the reasoning-optimized o-series 14. While the newer o4-mini model, released in 2025, achieved near-perfect scores (99.5%) on the American Invitational Mathematics Examination (AIME) when using a Python interpreter, the standard GPT-4o mini is positioned as a general-purpose model with less specialized reasoning capability 4.
User reports from the OpenAI developer community have highlighted specific failure modes, including instances where the model "hallucinates" or fabricates entire research studies, methods, and data when asked to analyze uploaded PDF documents rather than reading the actual content 8. Third-party benchmarks on hallucination detection show that GPT-4o mini's reliability varies significantly by task; for example, it maintains high accuracy on the AI2 Reasoning Challenge (ARC) but shows a higher error rate on the GSM8k math dataset 7. The model is also subject to typical large language model vulnerabilities, such as susceptibility to prompt injections and jailbreaks, though OpenAI states that it utilizes reinforced safety training to mitigate these risks 14.
Performance
Benchmark Evaluations
In standardized performance evaluations, GPT-4o mini achieved a score of 82.0% on the Massive Multitask Language Understanding (MMLU) benchmark 1. OpenAI states that this performance exceeds that of other models in the small-scale category, reporting comparative scores of 77.9% for Google's Gemini 1.5 Flash and 73.8% for Anthropic's Claude 3 Haiku 1. Independent data compiled by Artificial Analysis reported similar margins, placing Gemini 1.5 Flash at 79% and Claude 3 Haiku at 75% on the MMLU index 2.
On the LMSYS Chatbot Arena leaderboard, which aggregates human preference rankings, OpenAI reported that an early version of GPT-4o mini outperformed the original GPT-4 Turbo (specifically the 01-25 version) as of July 18, 2024 1. The developer claims the model demonstrates superior performance in function calling and long-context handling compared to GPT-3.5 Turbo 1.
Specialized Task Proficiency
The model has been evaluated across several domain-specific benchmarks for math, coding, and multimodal reasoning:
- Mathematical Reasoning: On the Multilingual Grade School Math (MGSM) benchmark, GPT-4o mini scored 87.0% 1. For comparison, OpenAI reported scores of 75.5% for Gemini 1.5 Flash and 71.7% for Claude 3 Haiku on the same test 1.
- Coding: In the HumanEval benchmark, which measures the ability to generate functional code, GPT-4o mini reached 87.2% 1. This is compared to reported scores of 71.5% for Gemini 1.5 Flash and 75.9% for Claude 3 Haiku 1.
- Multimodal Reasoning: On the Multimodal Multi-discipline Usage (MMMU) benchmark, the model scored 59.4%, outperforming reported figures for Gemini 1.5 Flash (56.1%) and Claude 3 Haiku (50.2%) 1.
Speed and Operational Efficiency
Third-party testing by Artificial Analysis recorded a median output speed for GPT-4o mini of 202 tokens per second 2. This benchmark indicates that the model is more than twice as fast as both the flagship GPT-4o and the predecessor GPT-3.5 Turbo 2. Analysts suggest this speed makes the model suitable for latency-sensitive applications, such as real-time customer support and complex agentic workflows that require multiple sequential model calls 12.
Cost Structure
GPT-4o mini was introduced with a pricing model designed for high-volume scalability. It is priced at 15 cents per one million input tokens and 60 cents per one million output tokens 12. OpenAI asserts that this pricing represents a cost reduction of more than 60% compared to GPT-3.5 Turbo 1. The developer further states that the cost per token has decreased by approximately 99% relative to the text-davinci-003 model released in 2022 1.
Safety & Ethics
Safety Features and Alignment Techniques
OpenAI asserts that safety protocols for GPT-4o mini are integrated into the model's development lifecycle, from data ingestion to post-training refinement 1. During the pre-training phase, the developer applies automated filters to exclude content that violates its usage policies 1. Specifically, these filters target hate speech, sexually explicit material, spam, and platforms known for aggregating personally identifiable information (PII) 1.
To align model behavior with human intent and maintain factual consistency, OpenAI utilizes Reinforcement Learning from Human Feedback (RLHF) 1. This process involves human evaluators ranking model outputs to steer the system toward responses that the developer considers safe and useful 1. GPT-4o mini’s safety architecture was also assessed using the "Preparedness Framework," a set of internal guidelines designed to track and mitigate risks associated with increasingly capable AI systems 1.
Instruction Hierarchy
A primary technical feature of GPT-4o mini is the implementation of an "instruction hierarchy" technique 1. OpenAI identifies GPT-4o mini as the first model in its API lineup to implement this method, which is intended to bolster the model’s defense against prompt injection attacks 1. By explicitly defining a hierarchy of instructions, the model is trained to prioritize developer-defined system instructions over user-provided prompts that may attempt to bypass safety filters or extract internal system data 1. OpenAI states that this mechanism is intended to make the model more reliable for deployment in high-volume, automated environments where human oversight of every interaction is not feasible 1.
Red-Teaming and External Evaluations
The model's robustness was tested through red-teaming, a process where internal and external parties attempt to induce the model to produce harmful or unintended outputs 1. For GPT-4o mini, OpenAI engaged more than 70 external experts with backgrounds in disciplines such as social psychology, cybersecurity, and misinformation 1. These experts were tasked with identifying edge cases where the model might generate biased content or be used for deceptive purposes 1.
According to the developer, findings from these evaluations were incorporated into the final model to mitigate identified risks 1. The results of these tests were intended for inclusion in the GPT-4o system card and a Preparedness scorecard 1. Despite these measures, OpenAI notes that it continues to monitor how the model is used to identify and address new risks as they emerge in practical applications 1.
Applications
GPT-4o mini is primarily utilized for high-volume, low-latency tasks that require frequent model interactions at a lower price point than flagship models 13. OpenAI asserts that the model's cost-efficiency makes it suitable for "chaining" multiple API calls, where complex workflows are divided into parallel or sequential steps 1.
In enterprise environments, the model has been integrated into productivity tools such as the email client Superhuman and the financial platform Ramp 1. According to OpenAI, Superhuman employs the model to draft email responses by analyzing conversation histories 1. Ramp utilizes the model to automate the extraction of structured data from financial documents, such as receipt files, which requires parsing specific information from visual or text-based inputs 1.
Customer support is a primary application for the model due to its speed in providing real-time text responses 1. The model is designed to support interactive chatbots where low latency is necessary for maintaining user engagement 13. Additionally, its 128,000-token context window allows it to process large datasets, such as full software codebases or long conversation logs, which OpenAI states enables improved long-context performance compared to GPT-3.5 Turbo 1.
While the model is capable of coding and mathematical reasoning, it is frequently deployed in iterative workflows rather than as a replacement for the highest-complexity reasoning models 1. Developers use the model for structured planning, such as generating technical specifications and breaking project implementations into discrete steps 1. As of its release, the model replaced GPT-3.5 Turbo for Free, Plus, and Team users within the ChatGPT interface, serving as the standard model for general-purpose queries 13. OpenAI positions the model as a solution for tasks where operational speed and cost are prioritized over the advanced reasoning capabilities of larger frontier models 1.
Reception & Impact
The release of GPT-4o mini was widely characterized by technology analysts as a significant move in an ongoing "price war" among major artificial intelligence developers 2. By pricing the model at 15 cents per million input tokens and 60 cents per million output tokens—making it more than 60% cheaper than its predecessor, GPT-3.5 Turbo—OpenAI positioned the model to compete directly with other lightweight offerings such as Anthropic’s Claude 3 Haiku and Google’s Gemini 1.5 Flash 12.
Industry reception focused on the model's performance relative to its size and cost. TechCrunch noted that as small AI models improve, they have become increasingly popular for developers who require speed and cost efficiency for high-volume, repetitive tasks 2. Independent evaluation by Artificial Analysis reported that GPT-4o mini demonstrated a median output speed of 202 tokens per second, which is more than twice the speed of both GPT-4o and GPT-3.5 Turbo 2. George Cameron, co-founder of Artificial Analysis, described the model as a compelling offering for applications dependent on low latency, such as consumer-facing agents and agentic LLM workflows 2.
The model's introduction represented a notable intelligence shift for OpenAI’s entry-level tier. OpenAI stated that the model achieved an 82% score on the MMLU benchmark, surpassing the scores reported for Gemini 1.5 Flash and Claude 3 Haiku 12. At the time of its release, the model also outperformed the larger GPT-4 Turbo in chat preferences on the LMSYS Chatbot Arena leaderboard 1.
From an economic perspective, the model's low cost and 128K token context window are expected to impact AI startups building products that require extensive context or multiple model calls, such as those that chain or parallelize API requests 1. OpenAI’s head of Product API, Olivier Godement, stated that the model was designed to expand the range of viable AI applications by making intelligence more affordable 12. Early adopters, including Ramp and Superhuman, reported using the model for tasks such as structured data extraction from receipts and high-quality email response generation, asserting that the model performed significantly better than GPT-3.5 Turbo in these specific use cases 1.
Version History
GPT-4o mini was officially released by OpenAI on July 18, 2024, initially appearing as a text and vision-capable model within the Assistants, Chat Completions, and Batch APIs 1. On the date of launch, the model was integrated into the ChatGPT interface for Free, Plus, and Team users, succeeding GPT-3.5 Turbo as the primary lightweight model 12. Access for ChatGPT Enterprise users followed on July 25, 2024 12.
The model launched with the specific identifier gpt-4o-mini-2024-07-18 4. In August 2024, OpenAI expanded the model's utility by introducing fine-tuning capabilities for developers 1. To facilitate migration from older models, the developer offered a promotional period through September 2024 during which users could train the model with up to 2 million tokens per day at no cost 5. Third-party platforms, such as Microsoft's Azure AI Foundry, subsequently added support for fine-tuning this specific version, with commitments to maintain training availability until at least September 2026 and deployment support through March 2027 4.
While the initial release was limited to text and image inputs, OpenAI outlined a roadmap for a progressive rollout of broader multimodal features 1. This included planned support for audio and video inputs as well as generated audio and video outputs 1. As part of this expansion, OpenAI introduced the gpt-4o-mini-realtime-preview version to the API, designed to support low-latency, speech-to-speech interactions 8. Unlike the flagship GPT-4o, which launched with a wider array of native multimodal outputs, the mini variant's multimodal features were phased in to maintain the model's focus on low latency and cost-efficiency 12.
Sources
- 1“GPT-4o mini: advancing cost-efficient intelligence”. Retrieved March 25, 2026.
Today, we're announcing GPT-4o mini, our most cost-efficient small model... GPT-4o mini scores 82% on MMLU and currently outperforms GPT-4 on chat preferences in LMSYS leaderboard. It is priced at 15 cents per million input tokens and 60 cents per million output tokens... more than 60% cheaper than GPT-3.5 Turbo.
- 2“OpenAI unveils GPT-4o mini, a smaller and cheaper AI model”. Retrieved March 25, 2026.
OpenAI introduced GPT-4o mini on Thursday, its latest small AI model... GPT-4o mini will replace GPT-3.5 Turbo as the smallest model OpenAI offers... Relative to comparable models, GPT-4o mini is very fast, with a median output speed of 202 tokens per second.
- 3“OpenAI debuts mini version of its most powerful model yet”. Retrieved March 25, 2026.
The mini AI model is an offshoot of GPT-4o, OpenAI's fastest and most powerful model, which it launched in May... OpenAI, backed by Microsoft, has been valued at more than $80 billion by investors. The company... is under pressure to stay on top of the generative AI market while finding ways to make money as it spends massive sums on processors and infrastructure.
- 4“Introducing OpenAI o3 and o4-mini”. Retrieved March 25, 2026.
OpenAI o4-mini is a smaller model optimized for fast, cost-efficient reasoning... achieving 99.5% pass@1 on AIME 2025 when given access to a Python interpreter. These models should also feel more natural and conversational.
- 5“GPT-4o mini - Intelligence, Performance & Price Analysis”. Retrieved March 25, 2026.
GPT-4o mini is below average in intelligence... It's also notably slow, however highly concise. At 42 tokens per second, GPT-4o mini is notably slow (93). Knowledge cutoff: Oct 1, 2023.
- 6“GPT-4o mini: Pricing, Benchmarks & Performance”. Retrieved March 25, 2026.
MMLU: 0.82/1, HumanEval: 0.87/1, MGSM: 0.87/1, MMMU: 0.59/1. GPT-4o mini ranks #12 in DROP and #13 in MGSM.
- 7“Automatically detecting LLM hallucinations with models like GPT-4o and Claude”. Retrieved March 25, 2026.
Each row lists the accuracy of LLM responses... TriviaQA 89.8%, ARC 98.7%, GSM8k 72.8%. Trustworthiness scores from TLM yield significantly more reliable AI.
- 8“GPT hallucinating entire research studies - ChatGPT / Bugs - OpenAI Developer Community”. Retrieved March 25, 2026.
giving GPT a PDF research study to analyze and it completely fabricates one rather than reading the document. Methods, data, interpretation. Just makes the whole thing up!
- 18“GPT-4 mini vs GPT-3.5 Turbo. I just tried out the new model and am ...”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/ClaudeAI/comments/1e6owe1/gpt4_mini_vs_gpt35_turbo_i_just_tried_out_the_new/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket
- 19“GPT-4o-mini randomly much slower than GPT-3.5-turbo - Bugs”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"GPT-4o-mini randomly much slower than GPT-3.5-turbo - API / Bugs - OpenAI Developer Community","description":"Use GPT-4o-mini and GPT-3.5-turbo-0125 each to answer the same query. Sample 10 times. \nGPT-3.5-turbo speed is consistent (about 5 seconds for 500 tokens). \nGPT-4o speed is mostly slightly slower than GPT-3.5, but about …","url":"https://community.openai.com/t/gpt-4o-mini-randomly-much-slower-than-gpt-3-5-turbo/916756","content":"# GPT
