GPT-5.1 Codex Max
GPT-5.1 Codex Max is a specialized large language model (LLM) developed by OpenAI and released in 2025 as a high-performance variant of the GPT-5.1 architecture tailored for software engineering 4. Positioned as a "frontier agentic coding model," it is designed to operate primarily within command-line interfaces (CLIs) and integrated development environments (IDEs) rather than as a general-purpose chat assistant 4. The model is trained on realistic engineering workflows, including pull requests, code reviews, and multi-file repository refactors, and features native support for Windows-based development environments, including tools such as PowerShell and Visual Studio 4.
The model’s architecture is defined by a "context compaction" mechanism that enables it to maintain operational state across interactions spanning millions of tokens 4. As the context window fills, Codex Max is trained to summarize its own history—extracting key architectural decisions, test results, and file locations—into dense memory blocks that are carried forward into subsequent windows 4. This design allows the model to sustain long-horizon tasks, such as multi-hour debugging loops or complex system migrations, without losing coherence 4. Additionally, the model includes adjustable reasoning effort settings; in its "extra-high" (xhigh) mode, it utilizes an expanded internal thinking budget to perform deeper searches and multiple candidate passes before generating final code outputs 4.
In technical evaluations, GPT-5.1 Codex Max has demonstrated high proficiency in technical reasoning and bug fixing. It achieved a 77.9% success rate on the SWE-Bench Verified benchmark, placing it in a near-state-of-the-art band alongside contemporaries such as Anthropic’s Claude Opus 4.5 4. On the GPQA Diamond benchmark, which assesses PhD-level technical reasoning, the model scored 89.4%, indicating a deep capacity for conceptual and mathematical logic 4. A key distinction noted in its output is a preference for specific technology stacks, such as Vite and Express, over more standardized frameworks like Next.js 4. In competitive coding environments, it maintains a rating of approximately 2,439 Elo on LiveCodeBench Pro 4.
While capable of autonomous operation, the model’s agentic behavior has drawn observations regarding its tendency to over-eagerly refactor code segments outside the requested scope in an attempt to optimize broader systems 4. Consequently, third-party assessments suggest that the model requires oversight from "Senior Engineer" level human supervisors, as its reliability in long-duration autonomous tasks remains limited to approximately 50% in specialized challenges 4. Economically, Codex Max is positioned for high-volume iteration, with per-token pricing significantly lower than general-purpose frontier models, making it a viable option for continuous background tasks like overnight maintenance and repository-wide debugging loops 4.
Background
Background
The development of GPT-5.1 Codex Max followed an evolution in specialized programming models at OpenAI that began with the release of the original Codex in 2021 40. While earlier iterations like GPT-4o utilized unified multimodal architectures for general tasks 2, OpenAI researchers characterized these previous models as "insufficiently aligned" for advanced engineering autonomy 7. By late 2025, OpenAI began replacing the GPT-4 series with the GPT-5.1 architecture, which offered larger context windows and higher throughput at a lower cost for developers 731. According to OpenAI, Codex Max was developed as a specialized configuration of this 5.1 base, powered by "codex-1"—an iteration optimized for software engineering through reinforcement learning on realistic development environments 71634.
The model's debut on November 19, 2025, occurred during a period of concentrated activity in the AI market 7934. This release cycle followed the launch of the base GPT-5 on August 7, 2025 42. The release of Codex Max was framed by the arrival of Google’s Gemini 3 Pro on November 18 and Anthropic’s Claude Opus 4.5 on November 24, 2025 18. On the same day as the Claude Opus 4.5 release, OpenAI also launched the base GPT-5.1 model 22. The competitive landscape at this time was defined by a shift toward "engineering-grade" model autonomy, with laboratories competing on the SWE-bench Verified benchmark 48. During this release window, Claude Opus 4.5 was reported to be the first model to exceed an 80% success rate on the benchmark, while GPT-5.1 Codex Max reached 77.9% in its highest reasoning mode 1813.
OpenAI stated that the motivation for building Codex Max was to support long-horizon agentic workflows that require sustained interaction over multiple hours 734. Previous models faced performance degradation as static context windows reached saturation, leading to the implementation of "context compaction" in the 5.1 iteration 7. According to OpenAI, this architectural feature allows the model to periodically summarize its own history, distilling architectural choices and test results into dense summary blocks to maintain coherence across interactions spanning millions of tokens 734.
By the time of its release, the AI field was increasingly focused on infrastructure and distribution as key differentiators 8. Market analysts noted that while general-purpose chatbots remained common, enterprise demand had shifted toward systems capable of navigating complex, multi-file codebases and executing terminal-based workflows independently 812. Codex Max was positioned to address this by moving away from the chat-layer interface of standard large language models toward deeper integration with command-line tools and version control systems 720.
Architecture
GPT-5.1 Codex Max is architected as a specialized configuration of the GPT-5.1 base model, specifically optimized for high-density code token processing and long-horizon agentic workflows 4. Unlike the general-purpose variants in the GPT-5.1 family, the Codex Max architecture is engineered to function as a specialist for software engineering tasks, emphasizing deep integration with terminal environments and command-line interfaces (CLIs) 4.
A defining architectural feature of GPT-5.1 Codex Max is a mechanism known as context compaction 4. Rather than treating the context window as a fixed boundary that terminates once filled, the model is trained to manage its memory by periodically summarizing and compressing its own interaction history 4. While the active context per window is estimated in the hundreds of thousands of tokens, the compaction process distills salient information—such as architectural decisions, file locations, and test failures—into a dense summary block that persists into the next window 4. This design allows the model to maintain a coherent reasoning chain over total interactions spanning millions of tokens, enabling it to execute project-scale refactors and multi-hour debugging loops without losing track of long-term goals 4.
The model utilizes a tiered reasoning infrastructure that introduces variable "effort" modes to control the depth of internal processing 4. In its highest setting, designated as "xhigh," the model is permitted an expanded internal thinking budget, allowing it to perform extensive internal searches or generate and evaluate multiple candidate solutions before committing to an output 4. According to technical reviews, this architecture is more compute-efficient than previous iterations, achieving comparable or superior results with roughly 30 percent fewer internal thinking tokens than earlier Codex models 4.
The training methodology for GPT-5.1 Codex Max deviated from general text ingestion to focus on realistic software engineering workflows 4. The training data included high volumes of pull requests, code reviews, bug tickets, and multi-file refactors 4. A significant technical addition in this generation is native training for Windows environments, which incorporates specialized data for PowerShell, Visual Studio, and mixed Windows–Linux toolchains 4. This training allows the model to reason about advanced system architectures and handle the nuances of build systems and deployment pipelines directly within the terminal 4.
Architecturally, the model is optimized for tool-use autonomy rather than simple text generation. It is designed to work natively with the Codex CLI to inspect directory trees, invoke compilers, and parse logs 4. This integration, combined with the context compaction mechanism, supports sustained agentic behavior where the model can operate for 24-hour periods on a single task, iteratively editing and testing code based on live execution feedback 4. While highly capable in backend languages like Python and C++, the model’s architecture focuses more on functional correctness and system optimization than on stylistic or user-interface design 4.
Capabilities & Limitations
GPT-5.1 Codex Max is characterized by its capacity for long-horizon reasoning and agentic software engineering, particularly within terminal and integrated development environments (IDEs) 4. OpenAI states that the model's "context compaction" mechanism allows it to maintain coherent logic over millions of tokens of interaction by periodically distilling history into dense summary blocks 4. In technical benchmarks, the model achieved a 77.9% score on SWE-Bench Verified when operating in its highest reasoning mode, and it scored 89.4% on the GPQA Diamond benchmark, indicating a high proficiency in PhD-level conceptual reasoning 4.
Full-Stack Engineering and Data Extraction
In practical application tests, GPT-5.1 Codex Max has demonstrated a unique capacity for handling specific document processing tasks. It was the only model in a 2025 comparative study to successfully implement functional PDF text extraction for well-formatted documents, outperforming peers such as Gemini 3 Pro and Claude Opus 4.5 in this area 1. The model also shows high flexibility in architectural selection; while many contemporary models default to standard frameworks like Next.js, Codex Max has been observed to successfully implement unconventional backend-heavy stacks, such as Vite paired with a custom Express server 1.
Independent evaluations indicate that the model is particularly robust in backend development using languages like Python and C++ 4. In professional environments, internal data from OpenAI suggests that more than 90% of their engineers utilize the model weekly, noting a correlation between its deployment and an increased volume of pull requests 4.
Known Limitations and Failure Modes
Despite its engineering focus, GPT-5.1 Codex Max exhibits measurable deficiencies in frontend quality and web standards. Testing by Hans Reinl revealed that the model's generated code often scores poorly in accessibility (A11y) and search engine optimization (SEO), with recorded scores of 69 and 82 respectively—figures notably lower than those of its competitors 1. The build processes initiated by the model are frequently described as "fragile," often requiring external agent intervention or human correction to resolve initial setup errors 1.
Functional bugs in state management have also been documented. In one development cycle for a text-to-speech application, the model failed to implement persistence logic, resulting in a bug where audio playback would not terminate upon a page refresh 1. Additionally, the model's agentic nature can lead to "over-improvement," where it refactors or modifies code modules that were not included in the original prompt's scope 4.
Intended vs. Unintended Use
OpenAI positions GPT-5.1 Codex Max as a specialist tool for long-running engineering workloads, such as multi-file refactors, bug ticket resolution, and extended debugging loops 4. It is intended to function as a terminal-based agent rather than a general-purpose chat assistant. Consequently, it is less effective for design-heavy tasks; it is described as "stylistically conservative" in areas like SVG graphics and UI design 4. While capable of sustained autonomous work, safety assessments show it reliably completes only approximately 50% of complex, long-horizon challenges, meaning human supervision remains a requirement for production-grade output 4.
Performance
GPT-5.1 Codex Max is primarily evaluated through software engineering benchmarks that simulate real-world coding issues and repository management. On the SWE-Bench Verified metric, which requires models to resolve GitHub issues by producing functional patches, Codex Max achieved a score of 77.9% when operating in its highest reasoning mode 4. This performance placed it slightly behind Claude Opus 4.5, which recorded 80.9%, but ahead of Gemini 3 Pro, which was reported at 74.2% 4, 1.
In practical application tests involving the development of a full-stack Minimum Viable Product (MVP), Codex Max has demonstrated high iteration speeds compared to its peers. During a 2025 benchmark building a text-to-speech application, the model completed the total development cycle in 9 minutes and 20 seconds 1. This was notably faster than Gemini 3 Pro, which took 15 minutes and 30 seconds, and Claude Opus 4.5, which required 22 minutes 1. The initial generation phase for Codex Max took 8 minutes and 20 seconds, followed by a 1-minute verification and fix cycle 1.
Evaluation of the web code generated by the model using Google Lighthouse revealed a performance gap between desktop and mobile environments. Codex Max achieved a desktop performance score of 98, but its mobile score was lower at 74 1. Independent analysis attributed this mobile deficiency to Largest Contentful Paint (LCP) issues in the generated code 1. Furthermore, the model's output in these tests received an accessibility score of 69 and an SEO score of 82, which were lower than the scores achieved by Gemini 3 Pro and Claude Opus 4.5 in the same evaluation 1.
Beyond standard software engineering, Codex Max performs at a high level in conceptual reasoning and competitive programming. On the GPQA Diamond benchmark, which assesses PhD-level technical knowledge, the model scored 89.4%, surpassing Claude Opus 4.5’s 82.4% 4. In competitive programming, it reached approximately 2,439 Elo on the LiveCodeBench Pro ladder 4. However, in scientific orchestration tasks evaluated via CORE-Bench, Codex Max completed approximately 40% of tasks, while Claude Opus 4.5 achieved a 95% completion rate when using specialized agent scaffolds 4.
Regarding computational efficiency, OpenAI states that Codex Max reaches its target quality levels using approximately 30% fewer internal "thinking" tokens than earlier Codex iterations 4. The model utilizes a tunable reasoning depth, where the "xhigh" effort mode is required to reach its peak SWE-Bench and GPQA results 4. While the model is optimized for speed, independent reports note that it may occasionally prioritize architectural changes over narrow precision, sometimes refactoring modules that were not explicitly part of a requested bug fix 4.
Safety & Ethics
GPT-5.1 Codex Max is evaluated within formal preparedness and alignment frameworks to assess its potential for misuse in sensitive domains 4. In cybersecurity red-teaming, the model has been characterized as "very capable," though OpenAI's testing indicates it remains below the thresholds that would necessitate severe deployment restrictions 4. Specifically, evaluations designed to elicit deceptive or manipulative behavior found that the model failed to execute sophisticated sabotage or maintain complex deceptive strategies over long durations 4. This limited reliability in multi-step, autonomous tasks is currently viewed as a primary constraint on its capacity for high-level malicious use 4. The model’s alignment techniques are tailored for "human-in-the-loop" engineering rather than total autonomy 4. Despite its agentic design, safety-focused organizations have noted that human supervision is essential because the model reliably completes only about 50% of multi-hour challenges 4. A recurring alignment issue is "over-eager" optimization, where the model refactors code segments or modules it was not explicitly instructed to modify 4. This behavior stems from its training to optimize broader system health but can result in the introduction of unrequested architectural changes that may conflict with human developer intent 4. Operational risks associated with Codex Max include the generation of technically plausible but incorrect code, such as the invention of non-existent configuration flags or API parameters 4. In complex build environments, such errors can lead to broken dependencies or security vulnerabilities if the code is not rigorously vetted 4. To counter these risks, the model is frequently deployed alongside the Codex CLI, which provides iterative logging and automated test feedback 4. This setup allows the system to detect and summarize its own failures into its compressed context state, which OpenAI asserts serves as a practical safeguard 4. For deployment in professional environments, industry analysis suggests the use of review gates and sandboxed terminal environments 4. While Codex Max includes support for Windows-native toolchains, the potential for the model to execute unintended system commands requires restrictive permissions 4. Compared to contemporary models like Claude Opus 4.5, which provides inspectable reasoning traces, Codex Max relies more heavily on its internal compaction and CLI-based verification to ensure output robustness 4.
Applications
GPT-5.1 Codex Max is predominantly utilized as a backend model for AI-powered integrated development environments (IDEs), notably within the Cursor platform 1. Unlike general-purpose models, it is optimized for "engineering-grade" workloads, functioning as an agentic system that interacts directly with terminals and file systems rather than acting as a simple chat layer 4. OpenAI states that the model is specifically trained to handle realistic software engineering tasks such as resolving bug tickets, conducting code reviews, and managing pull requests 4.
In rapid prototyping scenarios, the model has demonstrated a capacity to generate functional minimum viable products (MVPs) for full-stack applications with custom backend requirements 1. During independent benchmarking for an application named "Speakit," the model was used to develop a complete tech stack consisting of a Vite and React frontend with an Express backend 1. This architectural choice distinguished it from other frontier models that defaulted to the Next.js framework 1. The model successfully implemented complex features such as URL and PDF text extraction for well-formatted documents, though the resulting build required manual intervention to resolve initial setup errors 1.
The model is also applied to long-horizon engineering tasks, with documented use cases involving continuous operation for up to 24 hours on a single coding challenge 4. Its architecture allows it to maintain state through "context compaction," enabling it to perform project-scale refactors and multi-stage debugging loops without losing coherence 4. Independent developers have reported particular effectiveness in backend development using languages such as Python and C++, where the model tends to produce code ready for deployment 4. Beyond software development, it is applied to scientific reasoning tasks, achieving performance levels near those of human experts on benchmarks such as PaperBench-10, which evaluates the reproduction of published scientific results 4.
Certain scenarios are characterized as less suitable for GPT-5.1 Codex Max. Independent testing indicated that the model produces lower accessibility and SEO scores compared to competitors, making it less ideal for projects where frontend optimization is the primary focus 1. Additionally, the model has been observed to "over-eagerly" refactor modules that were not part of the initial request, which can lead to unintended changes in large codebases 4. It is noted for being stylistically conservative in design-heavy tasks, such as generating SVG graphics or polished user interfaces 4.
Reception & Impact
Since its release, GPT-5.1 Codex Max has been characterized by its distinct operational profile compared to general-purpose large language models. Within the software engineering community, the model has gained a reputation as an "unconventional architect" due to its tendency to prioritize systemic optimizations over local, narrow code fixes 4. While this behavior allows the model to handle project-scale refactors and complex migrations, it has also led to reports of the system over-eagerly modifying stable modules that were not the primary target of a request 4.
Industry analysts and developers have frequently compared the model's performance to competitors such as Claude Opus 4.5 and Gemini 3 Pro. While Claude is often described as "surgical" and "engineering-minded" for its precise, targeted changes, GPT-5.1 Codex Max is recognized for its "flexibility" in navigating long-horizon agentic workflows 4. However, some technical reviews have raised critiques regarding a "vibe over substance" discrepancy in certain outputs; specifically, the model's ability to maintain coherent, multi-window reasoning through context compaction can sometimes produce code that appears architecturally sound but requires significant human "agent intervention" to resolve edge-case failures or logic errors 4.
The economic and professional impact of the model has been observed primarily in development velocity and workforce role shifts. OpenAI reports that 95% of its internal engineers utilize the model weekly, resulting in a roughly 70% increase in average pull-request volume 11. This trend has influenced the role of junior developers, as the model demonstrates the ability to perform routine software maintenance and bug-fixing at or above the level of a strong human contributor 4. Consequently, entry-level engineering roles are increasingly transitioning from manual code generation to a focus on high-level orchestration, debugging logs, and reviewing agentic tool calls 11, 4.
Media coverage has highlighted the model's competitive standing in standardized evaluations. VentureBeat noted that Codex-Max achieved 77.9% accuracy on SWE-Bench Verified, placing it in a near-state-of-the-art band alongside Claude Opus 4.5 and ahead of Gemini 3 Pro's 74.2% 11, 4. Despite these quantitative gains, third-party safety and reliability assessments indicate that the model's capacity for fully autonomous operation is still limited. Evaluations show it reliably completes only about 50% of multi-hour, long-horizon challenges, which reinforces the industry consensus that the system remains a sophisticated assistant requiring human oversight rather than a total replacement for engineering staff 4, 11.
Version History
GPT-5.1 Codex Max was introduced in mid-2025 as the specialized agentic coding variant of the GPT-5.1 model family 4. It featured an "extra-high" (xhigh) reasoning mode, which allowed for an expanded internal compute budget to address complex, multi-file engineering tasks 4. At its launch, the model's API pricing was established at $2.00 per 1,000 input tokens and $10.00 per 1,000 output tokens 5. Unlike other OpenAI models released in the same period, GPT-5.1 Codex Max did not initially support cached-input billing 5.
During its active cycle, the model was utilized for defensive security research. In December 2025, OpenAI reported that a researcher using the model identified and disclosed a source code exposure vulnerability within the React framework 6. However, starting in late 2025, some developers reported a perceived reduction in output quality 5. Community feedback characterized responses as increasingly "superficial" or "vague," with reports of the model failing to maintain discipline during code tracing or providing "laughable" answers that ignored specific repository constraints 5.
On December 18, 2025, OpenAI released GPT-5.2 Codex, which superseded the 5.1 variant 6. OpenAI stated that the 5.2 update improved upon GPT-5.1 Codex Max by offering more reliable tool calling, enhanced "native compaction" for long-horizon understanding, and superior performance in Windows environments 6. The successor model was also described by the developer as being more token-efficient in its reasoning processes compared to the 5.1 architecture 6.
Sources
- 1Barnacle Goose. (December 6, 2025). “GPT-5.1-Codex-Max vs Claude Opus 4.5”. Medium. Retrieved March 26, 2026.
GPT-5.1-Codex-Max is built as a high-end specialist for long-running agentic coding workloads, leaning on context compaction and deep integration with the command line. ... it is a specialised configuration trained and evaluated primarily on real development workflows: pull requests, code reviews, bug tickets, multi-file refactors, and extended debugging loops.
- 2(November 28, 2025). “OpenAI Retires GPT-4o API as Developers Shift to GPT-5.1”. LPC Centre. Retrieved March 26, 2026.
OpenAI recommends that developers switch to GPT-5.1 for new workloads due to its benefits, including larger context windows, enhanced reasoning modes... 4o input cost is higher than the newer 5.1... the model faced significant criticism from OpenAI researcher ‘Roon’... who deemed it ‘insufficiently aligned’.
- 4Kuldeep Paul. (December 1, 2025). “Gemini 3 Pro vs Claude Opus 4.5 vs GPT-5: The Ultimate Frontier Model Comparison”. Maxim AI. Retrieved March 26, 2026.
The artificial intelligence landscape experienced an unprecedented release cycle in late 2025... Google's Gemini 3 Pro arrived on November 18, followed by Claude Opus 4.5 from Anthropic on November 24, both building upon OpenAI's GPT-5 release from August 7.
- 5Aarambh Dev Hub. (December 6, 2025). “Claude Opus 4.5”. Medium. Retrieved March 26, 2026.
Claude scored 80.9% on SWE-bench Verified. That’s the first time any AI model has cracked 80% on this benchmark... these models are optimized for completely different things.
- 6(June 3, 2025). “Introducing Codex”. OpenAI. Retrieved March 26, 2026.
Today we’re launching a research preview of Codex: a cloud-based software engineering agent... Codex is powered by codex-1, a version of OpenAI o3 optimized for software engineering.
- 7Michal Sutter. (November 19, 2025). “OpenAI Debuts GPT-5.1-Codex-Max, a Long-Horizon Agentic Coding Model With Compaction for Multi-Window Workflows”. MarkTechPost. Retrieved March 26, 2026.
OpenAI has introduced GPT-5.1-Codex-Max, a frontier agentic coding model designed for long running software engineering tasks that span millions of tokens and multi hour sessions. It is available today inside Codex in the CLI.
- 8Reinl, Hans. (December 7, 2025). “Gemini 3 Pro vs GPT-5.1 Codex-Max vs Claude Opus 4.5: AI Coding Benchmark | Blog | Hans Reinl”. Hans Reinl Blog. Retrieved March 26, 2026.
GPT-5.1 made a surprising architectural choice: Vite + Express instead of the standard Next.js stack used by the others. Pros: It was the only model to successfully implement PDF Text Extraction for well-formatted documents. Cons: Accessibility scored a low 69, and SEO was mediocre at 82. A critical bug remained where text reading would not stop upon page refresh—a state management oversight.
- 9Carl Franzen. (November 19, 2025). “OpenAI debuts GPT‑5.1-Codex-Max coding model that completed a 24-hour task internally”. VentureBeat. Retrieved March 26, 2026.
The model has been internally observed to complete tasks lasting more than 24 hours... OpenAI states that 95% of its internal engineers use Codex weekly, and since adoption, these engineers have shipped ~70% more pull requests on average.
- 11(December 18, 2025). “Introducing GPT-5.2-Codex”. OpenAI. Retrieved March 26, 2026.
Today we’re releasing GPT‑5.2‑Codex... builds on GPT-5.1-Codex-Max’s frontier agentic coding... GPT‑5.2‑Codex is now better at long-context understanding, reliable tool calling, improved factuality, and native compaction... a security researcher using GPT‑5.1‑Codex‑Max found and disclosed a vulnerability in React.
- 12“GPT-5.1 Codex (high) - Intelligence, Performance & Price Analysis”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"title":"GPT-5.1 Codex (high) - Intelligence, Performance & Price Analysis","description":"Analysis of OpenAI's GPT-5.1 Codex (high) and comparison to other AI models across key metrics including quality, price, performance (tokens per second & time to first token), context window & more.","url":"https://artificialanalysis.ai/models/gpt-5-1-codex","content":"# GPT-5.1 Codex (high) - Intelligence, Performance & Price Analysis\n\n[Stay connected with us on X, Dis
- 13“GPT 5.1 Benchmarks : r/singularity - Reddit”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/singularity/comments/1ow9xcj/gpt_51_benchmarks/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](https://support.reddithelp.
- 16“Building more with GPT-5.1-Codex-Max - OpenAI”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"title":"Building more with GPT-5.1-Codex-Max","description":"Introducing GPT-5.1-Codex-Max, a faster, more intelligent agentic coding model for Codex. The model is designed for long-running, project-scale work with enhanced reasoning and token efficiency.","url":"https://openai.com/index/gpt-5-1-codex-max/","content":"# Building more with GPT-5.1-Codex-Max | OpenAI\n\n[Skip to main content](https://openai.com/index/gpt-5-1-codex-max/#main)\n\n[](https://openai
- 20“OpenAI's GPT-5.1-Codex-Max is now in public preview for GitHub ...”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"title":"OpenAI’s GPT-5.1-Codex-Max is now in public preview for GitHub Copilot","description":"GPT-5.1-Codex-Max is now rolling out in public preview in GitHub Copilot. Availability in GitHub Copilot OpenAI’s GPT-5.1-Codex-Max will be available to Copilot Pro, Pro+, Business, and Enterprise. You’ll be able…","url":"https://github.blog/changelog/2025-12-04-openais-gpt-5-1-codex-max-is-now-in-public-preview-for-github-copilot/","content":"# OpenAI's GPT-5.1-Code
- 22“OpenAI: GPT-5.1 to be released on November 24 2025 - X”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"title":"Lisan al Gaib on X: \"OpenAI: GPT-5.1 to be released on November 24 2025 https://t.co/Z6qsllEMp8\" / X","description":"","url":"https://x.com/scaling01/status/1986895079944991025","content":"## Post\n\n## Conversation\n\n[Lisan al Gaib](https://x.com/scaling01)\n\n[@scaling01](https://x.com/scaling01)\n\nOpenAI: GPT-5.1 to be released on November 24 2025\n\n[](https://x
- 31“GPT-5.1-Codex-Max vs Gemini 3 Pro: Next-Generation AI Coding ...”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"title":"GPT-5.1-Codex-Max vs Gemini 3 Pro: Next-Generation AI Coding Titans","description":"“” is published by Barnacle Goose.","url":"https://medium.com/@leucopsis/gpt-5-1-codex-max-vs-gemini-3-pro-next-generation-ai-coding-titans-877cc9054345","content":"[](https://medium.com/@leucopsis?source=post_page---byline--877cc9054345---------------------------------------)\n\
- 34“How to get Codex-5.1-Max to finish long tasks? - Prompting”. Retrieved March 26, 2026.
{"code":200,"status":20000,"data":{"title":"How to get Codex-5.1-Max to finish long tasks? - Prompting - OpenAI Developer Community","description":"Got the Pro plan and I’m not running out of calls. \nUp-to-date VS Code Codex, thinking set to High in most instances this happens. \nOne of many examples was today I asked codex to iron out all the bugs detected by UV ruf…","url":"https://community.openai.com/t/how-to-get-codex-5-1-max-to-finish-long-tasks/1367833","content":"# How to get Cod

