Claude Sonnet 4.0
Claude Sonnet 4.0

Claude Sonnet 4 is a large language model (LLM) developed by Anthropic, released on May 22, 2025, as part of the Claude 4 model generation 31018. Anthropic positioned the model as a "hybrid" offering within the Claude 4 family, designed to balance intelligence with operational speed and cost-efficiency as a successor to Claude 3.5 Sonnet 31017. The model is intended for various applications, ranging from daily productivity to autonomous "agentic" workflows 310. At launch, the model was made available through the Claude.ai platform, the Anthropic developer API, and cloud services including Amazon Bedrock and Google Cloud Vertex AI 102122.
A technical feature of Claude Sonnet 4 is its "hybrid reasoning" capability, which allows it to toggle between near-instant responses and an "extended thinking" mode 610. According to Anthropic, the model can utilize up to 64,000 tokens for internal deliberative chain-of-thought reasoning to address complex logic, mathematics, or coding problems 2610. The model features a 200,000-token context window, which the developer states was optimized to improve instruction-following and reduce "reward hacking," a phenomenon where models use shortcuts to satisfy prompts without fully completing tasks 21024. Third-party assessments by the development platform Lovable reported that Sonnet 4 is approximately 40% faster and produces 25% fewer errors than its predecessor 2628.
The model is optimized for software engineering and autonomous task execution 1029. In the SWE-bench Verified coding benchmark, Claude Sonnet 4 achieved a score of 72.7% 1329. Anthropic stated that this performance was slightly higher than its top-tier Opus 4 model and represented an increase from the 62.3% score of Claude 3.5 Sonnet 1029. GitHub integrated Sonnet 4 as the engine for autonomous coding features within GitHub Copilot, including pull request management and CI/CD error debugging 10. Anthropic also claims that the model has reduced instances of hard-coded behavior by approximately 70% compared to previous versions 10.
Claude Sonnet 4 API pricing is set at $3 per million input tokens and $15 per million output tokens 1012. While its 200,000-token context window is smaller than the 1-million-token capacity of Google's Gemini 2.5 Pro, independent evaluations suggest that the model remains competitive in text reasoning and coding accuracy 1013. The model's training data includes information up to March 2025 10. The model utilizes Anthropic's "Constitutional AI" safety framework, which is designed to maintain helpful and harmless outputs while allowing users to configure the level of autonomy the model exercises in agentic scenarios 210.
Background
Background
Claude Sonnet 4.0 was developed by Anthropic as the successor to the Claude 3.5 Sonnet series, continuing the organization's focus on "hybrid reasoning" architectures 217. The model was designed to address requirements in the large language model (LLM) field during early 2025, a period characterized by the release of competing models such as Google’s Gemini 2.5 1318. According to Anthropic, the primary motivation for the Claude 4 generation was to enhance the reliability of agentic workflows—systems where the model operates tools or computers autonomously—by reducing "shortcut" behaviors and improving reasoning faithfulness over extended durations 2317.
Development of the model involved a training cutoff of March 2025, utilizing a mix of publicly available internet data, non-public third-party datasets, and internal data generated by Anthropic 2. The model's training process utilized Constitutional AI, which applies a set of principles to align model behavior with human values, alongside human feedback and reinforcement learning 2. Throughout the development timeline, Anthropic conducted iterative safety evaluations on various model snapshots to monitor the emergence of autonomous capabilities and potential risks related to cybersecurity and biological knowledge 24.
A significant evolution from the predecessor Claude 3.5 Sonnet is the handling of the "extended thinking mode" 217. While previous versions exhibited different display behaviors for reasoning, Sonnet 4.0 introduced a feature where an auxiliary model summarizes thought processes that exceed specific length thresholds 26. Anthropic reports that this summarization is triggered in approximately 5% of reasoning tasks, while the remaining 95% of thought processes are shown in full 26. Developers may bypass this summarization by enabling a specific "Developer Mode" 26.
At the time of its release in May 2025, Anthropic categorized Claude Sonnet 4.0 under its AI Safety Level 2 (ASL-2) standard 2418. This designation is identical to the level assigned to Claude 3.5 Sonnet and implies that while the model demonstrates more advanced tool use and reasoning than earlier versions, it does not meet the thresholds for catastrophic risk defined in the company's Responsible Scaling Policy 24. Specifically, Anthropic noted that Sonnet 4.0 showed more modest improvements in high-risk domains compared to the larger Claude Opus 4.0, which was deployed under the more stringent ASL-3 standard 24.
Architecture
Claude Sonnet 4.0 is built on a dense transformer-based architecture specifically optimized for autonomous agentic workflows and complex software engineering tasks 4. While Anthropic has not officially disclosed the exact parameter count for the model, third-party technical estimates suggest it occupies the 50 billion to 100 billion parameter range, positioned between the smaller Haiku and larger Opus variants 5. The model incorporates an adaptive attention mechanism and absolute position embeddings to manage long-range dependencies within its input 45.
Hybrid Reasoning Framework
The core architectural innovation of the Claude 4 generation is a hybrid reasoning framework that allows for dynamic switching between two operational states 13:
- Standard Mode: Designed for rapid, near-instantaneous responses, optimizing for latency and cost in routine tasks and straightforward queries 13.
- Extended Thinking Mode: Enables a specialized internal chain-of-thought process where the model can expend a user-defined token budget—up to 64,000 tokens—to deliberate on complex problems 34.
During extended thinking, the model can interleave internal reasoning with external tool execution in a single turn 4. This allows the model to perform multiple searches or code executions, evaluate the intermediate results, and adjust its reasoning trajectory before producing a final output 14. To maintain efficiency for users, Anthropic employs 'thinking summaries'—a secondary, smaller model that condenses lengthy internal thought processes 12. According to the developer, this summarization is required for only approximately 5% of thought processes, while the majority are displayed in full 12.
Context and Memory Management
Claude Sonnet 4.0 features a standard context window of 200,000 tokens for general availability, though beta configurations have been reported to support up to 1 million tokens 45. A significant addition to the model's architecture is the implementation of 'memory files.' When granted access to local file systems by developers, the model can autonomously create, update, and reference persistent files to maintain state across long-running or multi-session tasks 13. This mechanism is intended to allow the model to build 'tacit knowledge' and maintain long-term task awareness without exhausting the immediate context window 1.
Tool Integration and Training
The architecture supports parallel tool execution, allowing the model to trigger multiple external functions simultaneously rather than in a linear sequence 13. This is integrated with the Model Context Protocol (MCP), which facilitates standardized communication between the model and external data sources or systems 5.
Training for the model was completed in early 2025 using a proprietary mix of data sources, with a knowledge cutoff of March 2025 2. The dataset includes public web information collected via a transparent crawler respecting robots.txt protocols, as well as non-public third-party data, internal synthetic data, and data from opted-in users 2. The model was aligned using human feedback and Constitutional AI principles, specifically focusing on 'honest and harmless' traits 2. Anthropic states that architectural refinements in this version resulted in a 65% reduction in behaviors where the model attempts to use 'shortcuts' or 'loopholes' to complete agentic tasks compared to its predecessor, Claude 3.7 1. The model is deployed under the AI Safety Level 2 (ASL-2) standard 2.
Capabilities & Limitations
Claude Sonnet 4.0 is a multimodal large language model that processes text, code, and visual data, designed to balance high-speed execution with complex reasoning capabilities 310. Its primary functional distinction is a 'hybrid reasoning' architecture that allows users to toggle between a low-latency Standard Mode for routine tasks and an Extended Thinking Mode for deliberative problem-solving 3. Anthropic states that in Extended Thinking Mode, the model utilizes an internal chain-of-thought process of up to 64,000 tokens, enabling it to perform self-correction and multi-step analysis before generating a final response 310.
Software Engineering and Coding
Sonnet 4.0 is optimized for advanced software engineering tasks and is characterized by its developer as a leader in coding performance 3. On the SWE-bench Verified benchmark, which evaluates a model's ability to resolve real-world GitHub issues, Sonnet 4.0 achieved an accuracy rate of 72.7% in standard mode, which increased to 80.2% when utilizing parallel test-time compute 310. These results surpassed the performance of the flagship Opus 4 in standard configurations 3. The model’s coding proficiency has led to its adoption in production-grade development environments, including its integration as the primary engine for the GitHub Copilot coding agent 3. Independent testing by development platforms like Lovable suggests that Sonnet 4.0 is 40% faster and reduces errors by 25% compared to the earlier Claude 3.7 Sonnet 10.
Agentic Autonomy and Navigation
Anthropic has emphasized the model's 'agentic' capabilities, which refer to its ability to perform autonomous, multi-step workflows with minimal human intervention 310. A notable advancement in Sonnet 4.0 is a reported 65% reduction in 'shortcut' and 'loophole' behaviors compared to its predecessors; this reduction indicates the model is less likely to attempt illogical workarounds when faced with complex instructions 3. This reliability facilitates long-horizon tasks such as autonomous codebase exploration, game-state tracking—demonstrated in navigation tests such as Pokémon simulations—and multi-repository refactoring 3. While the Opus 4 variant is intended for the most sustained autonomous operations, Sonnet 4.0 provides a more cost-effective alternative for high-volume agentic applications 310.
Visual Reasoning and Multimodality
The model supports visual inputs, allowing it to interpret charts, technical diagrams, and UI screenshots 3. In the MMMU visual reasoning benchmark, Sonnet 4.0 achieved a score of 77.6% when utilizing its internal reasoning mode, an improvement over its standard mode score of 72.6% 3. Despite these capabilities, the model’s visual reasoning performance is positioned below the flagship Opus 4, and it remains specialized for technical and analytical imagery rather than general creative multimedia analysis 310.
Limitations and Failure Modes
Sonnet 4.0 maintains a context window of 200,000 tokens, which is a significant limitation relative to competitors such as Google’s Gemini 2.5 Pro, which offers windows of up to 2 million tokens 10. This constraint restricts the model’s ability to process extremely large datasets or massive codebases in a single interaction 10. Additionally, the use of Extended Thinking Mode introduces noticeable reasoning-related latency, as the model generates a high volume of internal tokens before delivering an answer 3. While the model shows reduced shortcutting, it is not immune to hallucinations or logical errors, particularly in highly specialized mathematical contexts where it may be outperformed by models like OpenAI’s o4-mini 3. Anthropic also notes that the model’s performance is subject to its Responsible Scaling Policy, which includes safety classifiers that may occasionally trigger false positives, refusing benign prompts that resemble prohibited content 3.
Performance
Claude Sonnet 4.0 is characterized by its developer as a high-performance model that balances intelligence with operational speed, particularly in coding and agentic reasoning tasks 1011. On the SWE-bench Verified benchmark, which evaluates a model's ability to resolve 500 real-world software engineering challenges, Sonnet 4.0 achieved a score of 72.7% (pass@1) 11. According to Anthropic, this result exceeds the performance of the top-tier Claude Opus 4.0 (72.5%) and the previous Claude 3.7 Sonnet (62.3%) 1011. Comparative third-party testing from early 2025 indicates that Sonnet 4.0 outperformed OpenAI’s GPT-4.1 (54.6%) and Google’s Gemini 2.5 Pro (63.2%) on the same coding evaluation 10.
In the domain of graduate-level reasoning, the Claude 4 series demonstrates proficiency on the GPQA Diamond benchmark. Anthropic states that the models achieve approximately 79.6% accuracy, which increases to 83% when the "extended thinking" mode is engaged 10. While competitive with Google’s Gemini 2.5 Pro (~83%), Anthropic notes that Sonnet 4.0 ranks behind OpenAI’s o3 and GPT-4.1 in certain visual reasoning and high-school level mathematics tests when not utilizing extended reasoning modes 10. For complex mathematical problem-solving, the model's accuracy on the AIME 2025 benchmark reportedly increased from 33% to 90% when paired with extended thinking and additional test-time computation 1011.
Anthropic claims that Sonnet 4.0 provides a significant reduction in "reward hacking" behaviors—instances where a model avoids a task or provides non-functional code to fulfill a prompt 10. The developer reports an 8x improvement in handling unsolvable tasks compared to Claude 3.7 Sonnet and a 67% to 69% reduction in hard-coded response patterns 1011. Independent evaluations from software platforms such as iGent and GitHub suggest the model shows high reliability in autonomous codebase navigation, with iGent noting a reduction in navigation errors from 20% to near zero 11.
The model is positioned as a cost-efficient alternative to flagship models, with an API price of $3 per million input tokens and $15 per million output tokens 1011. While its output token cost is higher than some competitors, such as GPT-4, its input costs are approximately 90% lower, which Anthropic asserts makes it favorable for high-input, low-output analysis tasks 10. Additionally, the model supports prompt caching and batch processing, which can reduce token costs by up to 50% for enterprise applications 10.
Safety & Ethics
Anthropic developed Claude Sonnet 4.0 using a safety-centric framework governed by its Responsible Scaling Policy (RSP), which evaluates models for catastrophic risks in domains such as biology, chemistry, and cybersecurity 2. According to the developer, the model was trained using Constitutional AI, a method where the AI is aligned to a specific set of principles—such as the United Nations' Universal Declaration of Human Rights—to ensure it remains helpful, honest, and harmless (HHH) 2. This process is supplemented by Reinforcement Learning from Human Feedback (RLHF) and the training of specific character traits to mitigate harmful outputs 2.
Safety Classification and Evaluations
Under Anthropic’s RSP framework, Claude Sonnet 4.0 is deployed under the AI Safety Level 2 (ASL-2) standard 2. Anthropic states that while the model demonstrated improvements over the Claude 3.7 series, its capabilities in high-risk areas—such as chemical or biological weapons acquisition—did not meet the threshold for the more stringent ASL-3 protections applied to the Claude Opus 4.0 variant 2. In internal 'single-turn violative' testing, Sonnet 4.0 refused 98.99% of requests that violated the developer's usage policy, a performance level consistent with its predecessor 2. Conversely, the model maintained a low over-refusal rate of 0.23% for benign prompts in sensitive categories, suggesting an increased ability to distinguish between harmful intent and nuanced educational queries 2.
Agentic Safety and Oversight
As Claude Sonnet 4.0 is designed for autonomous 'agentic' workflows, Anthropic implemented specific evaluations to detect risks associated with computer use and long-horizon tasks 2. These tests focused on 'reward hacking'—where a model achieves a goal through unintended or harmful shortcuts—and potential 'loopholes' in autonomous execution 2. Red-teaming efforts evaluated the model for 'alignment faking,' where a system might hide its true reasoning to appear more compliant to human overseers 2.
To facilitate transparency and safety auditing, Anthropic provides a 'Developer Mode' for the model's extended thinking feature 2. While standard users receive summaries of the model’s internal reasoning for lengthier processes, Developer Mode allows auditors to access raw, unsummarized chain-of-thought data 2. According to Anthropic, this visibility is intended to ensure that the model's internal logic remains faithful to its final output and free from hidden misalignment 2.
Bias and Ethics
Evaluation of the model's performance regarding discriminatory and political bias indicated that Sonnet 4.0 maintains bias levels similar to or lower than the Claude 3.7 generation 2. Testing across identity attributes such as race, gender, and religion in domains like healthcare and financial advice showed that minor detected biases were often attributable to structural differences in response style rather than content 2. Additionally, Anthropic utilizes a 'Frontier Red Team' and external third-party experts to conduct ongoing assessments of child safety risks, including grooming and child sexual abuse material (CSAM) protections 2.
Applications
Claude Sonnet 4.0 is primarily utilized in software engineering, autonomous agent development, and enterprise-scale codebase maintenance. Due to its balance of reasoning capabilities and operational speed, it has been integrated into several major integrated development environments (IDEs) and development platforms.
Software Development and IDE Integration
Anthropic states that Sonnet 4.0 is optimized for 'agentic' coding scenarios, where the model acts as an autonomous collaborator rather than a simple autocomplete tool 1. GitHub has announced that Sonnet 4.0 will power the new coding agent within GitHub Copilot 1. Other major development platforms have reported performance gains following the model's integration:
- Cursor: The platform's developers describe the model as a significant advancement in understanding complex, multi-file codebases 1.
- Replit: Reports indicate that Sonnet 4.0 provides higher precision when implementing coordinated changes across multiple files simultaneously 1.
- Sourcegraph: The company noted that the model demonstrates an improved ability to stay on track during long-running development tasks and provides higher-quality code suggestions 1.
- Augment Code: Reported more 'surgical' code edits and a higher success rate when handling complex programming tasks 1.
Autonomous Agents and Web Navigation
The model is frequently deployed as the reasoning engine for autonomous agents that perform multi-step tasks. In the field of application development, iGent reported that Sonnet 4.0 is capable of autonomous multi-feature app development and reduced navigation errors from 20% to nearly zero 1. Similarly, the agent platform Manus noted that the model follows complex instructions more reliably and produces better aesthetic outputs than previous versions 1.
Enterprise and Industrial Use
In enterprise environments, the model is used for debugging and refactoring large-scale codebases. While Anthropic positions the larger Claude Opus 4 for tasks requiring several hours of continuous, independent operation—such as a 7-hour open-source refactor validated by Rakuten—Sonnet 4.0 is recommended for practical, everyday enterprise implementation where speed is a factor 1. Third-party development platform Lovable reported that switching to Sonnet 4.0 resulted in 25% fewer errors and a 40% increase in overall speed compared to Sonnet 3.7 10.
Ideal and Not-Recommended Scenarios
Sonnet 4.0 is considered ideal for tasks involving parallel tool use, complex instruction following, and scenarios requiring a balance between cost and intelligence 1. According to third-party reviews, it is particularly effective for 'chain-of-thought' reasoning via its Extended Thinking mode 10. However, the model may not be the optimal choice for tasks requiring the industry's largest context windows, as its 200,000-token limit is lower than that offered by competitors such as Google Gemini or GPT-4.1 10. Additionally, for the most complex, long-duration reasoning tasks exceeding several hours of autonomous effort, Anthropic suggests the use of the Opus 4 variant 1.
Reception & Impact
Industry and Developer Reception
Claude Sonnet 4.0 received generally positive reviews from the software development community upon its release in May 2025 1. Industry partners, including GitHub, announced the model would power new agentic features in GitHub Copilot, while Sourcegraph described the model as a "substantial leap" in software engineering capability 13. Early adopters reported that the model’s "surgical" code edits represented a significant improvement over the broader, sometimes less precise modifications seen in earlier versions 14. In comparative testing against its predecessor, Claude 3.7 Sonnet, the 4.0 model demonstrated a higher success rate in resolving complex, interconnected test failures, often completing tasks in a single iteration that previously required multiple attempts 4.
Independent benchmarks and developer reviews have characterized Sonnet 4.0 as a highly efficient alternative to larger flagship models. Analysis by Forgecode indicated that while Sonnet 4.0 is roughly three times faster than Claude Opus 4.0, it maintains comparable—and in some benchmarks, superior—performance on software engineering tasks 34. This has led to its rapid adoption for high-volume production workloads where cost-effectiveness and speed are prioritized alongside reasoning depth 3.
Technical Critiques and Limitations
Despite its performance, the model's "Extended Thinking" mode has faced criticism regarding latency and resource efficiency. While the mode allows for deeper reasoning through an internal chain-of-thought, it can consume up to 64,000 internal tokens per request, which can result in significant response delays compared to the "Standard" mode 3. Some users have noted that the high token consumption in reasoning-heavy tasks leads to higher operational costs, although Anthropic states that Sonnet 4.0 remains approximately five times more cost-effective than the Opus 4.0 variant 13. To mitigate the impact of lengthy outputs, Anthropic introduced "thinking summaries" to condense the reasoning process for users, though full access to raw reasoning chains is restricted to a specialized "Developer Mode" 1.
Societal and Economic Impact
The broader impact of Claude Sonnet 4.0 centers on the transition toward highly autonomous AI agents within the software engineering labor market. Analysts suggest that the model’s ability to operate independently for several hours—validated by Rakuten in multi-hour autonomous refactoring tasks—marks a shift from AI-assisted coding to "autonomous workflows" 13. This development has sparked industry discussion regarding the economic implications for software engineering roles, particularly for junior-level positions, as the model increasingly handles routine debugging, refactoring, and documentation with minimal human intervention 34. While Anthropic frames the model as a "virtual collaborator" designed to sustain focus on long-term projects, some observers emphasize the potential for labor displacement as agentic reliability continues to improve 13.
Version History
Claude Sonnet 4.0 was officially released on May 22, 2025, as part of the Claude 4 model generation, launching alongside the higher-capacity Claude Opus 4.0 110. This version introduced a "hybrid reasoning" architecture, allowing users to select between near-instant standard responses and an "extended thinking" mode for complex logic 110. At the time of launch, Anthropic transitioned the "Claude Code" command-line tool and its associated software development kit (SDK) to general availability, enabling direct integration into developer workflows 13. This rollout included the beta release of dedicated extensions for Visual Studio Code and JetBrains IDEs, which allow the model to suggest and display code edits directly within the file system 1.
The initial 4.0 release expanded the Anthropic API with four primary capabilities: a native code execution tool, a connector for the Model Context Protocol (MCP), a dedicated Files API, and a prompt caching feature allowing developers to store context for up to one hour 1. Pricing for the Sonnet 4.0 model was set at $3 per million input tokens and $15 per million output tokens 1.
On February 17, 2026, Anthropic released Claude Sonnet 4.6 2. This update included improvements to the model's core skills in coding, agent planning, and long-context reasoning 2. Notably, Sonnet 4.6 introduced a 1-million-token context window in a beta research preview, significantly expanding the 200,000-token limit of the original 4.0 version 210. Further updates in March 2026 introduced a "computer use" research preview for users on Pro and Max plans, allowing the model to navigate file systems and execute developer tools autonomously within the Claude Code environment 2. During this period, Anthropic also implemented "thinking summaries," which utilize a smaller auxiliary model to condense lengthy chain-of-thought processes generated during extended reasoning tasks 1.
Sources
- 1“Claude Sonnet 4 and Opus 4, a Review | by Barnacle Goose - Medium”. Retrieved March 25, 2026.
If you’ve been paying attention to the heated AI race, the yesterday’s (May 22, 2025) release of Anthropic’s Claude 4 models — Opus 4 and Sonnet 4 — probably didn’t escape your attention. . . Claude Sonnet 4 is a direct upgrade to the previous Claude 3.7 Sonnet model. It offers superior coding and reasoning while responding more precisely to your instructions, at a lower cost and faster speed than Opus 4. Both models introduce hybrid reasoning abilities. . . Sonnet 4 is available via API, Amazon Bedrock, and Google Cloud Vertex AI at launch. . . It actually scored 72.7% on SWE-bench (even fractionally higher than Opus 4). . . Lovable reports they have 25% less errors and be 40% faster overall with Sonnet 4 compared to Sonnet 3.7.
- 2“Claude 4 System Card”. Retrieved March 25, 2026.
Claude Opus 4 and Claude Sonnet 4 are two new hybrid reasoning large language models from Anthropic. ... Claude Sonnet 4 was released under the AI Safety Level 2 Standard. ... Claude Opus 4 and Claude Sonnet 4 were trained on a proprietary mix of publicly available information on the Internet as of March 2025.
- 3“Introducing Claude 4”. Retrieved March 25, 2026.
Claude Opus 4 and Sonnet 4 are hybrid models offering two modes: near-instant responses and extended thinking for deeper reasoning. ... Both models can use tools in parallel, follow instructions more precisely, and—when given access to local files by developers—demonstrate significantly improved memory capabilities, extracting and saving key facts.
- 4“Claude 4 System Card”. Retrieved March 25, 2026.
Claude Opus 4 and Claude Sonnet 4 were trained on a proprietary mix of publicly available information on the Internet as of March 2025... we have decided to deploy Claude Opus 4 under the AI Safety Level 3 Standard and Claude Sonnet 4 under the AI Safety Level 2 Standard.
- 5“Claude 4: A Comprehensive Analysis of Anthropic’s Latest AI Breakthrough”. Retrieved March 25, 2026.
Extended Thinking Mode: Provides up to 64,000 tokens of internal processing capacity for complex reasoning challenges... Claude 4 introduces persistent memory files functionality... support parallel tool execution, representing a significant advancement from sequential tool usage patterns.
- 6“Claude 4 Sonnet Thinking: Model Specifications and Details”. Retrieved March 25, 2026.
Built on a dense transformer architecture... supports an extensive 200,000-token context window for general availability, with a beta configuration supporting up to 1 million tokens... integrates a unique hybrid reasoning architecture... absolute position embedding.
- 10“Claude 4 Haiku, Sonnet, Opus Release Date & Features:”. Retrieved March 25, 2026.
On May 22, 2025, Anthropic announced Claude Opus 4 and Claude Sonnet 4 as the official Claude 4 family launch.
- 11“Models overview - Claude API Docs”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"Models overview","description":"Claude is a family of state-of-the-art large language models developed by Anthropic. This guide introduces the available models and compares their performance.","url":"https://platform.claude.com/docs/en/about-claude/models/overview","content":"# Models overview - Claude API Docs\n\nLoading...\n\n[](https://platform.claude.com/docs/en/home)\n* [Developer Guide](https://platform.claude.com/docs/en/intro)\n* [API Referen
- 12“Claude 4 Sonnet (Non-reasoning) vs GPT-4.1: Model Comparison”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"Claude 4 Sonnet (Non-reasoning) vs GPT-4.1: Model Comparison","description":"Comparison between Claude 4 Sonnet (Non-reasoning) and GPT-4.1 across intelligence, price, speed, context window and more.","url":"https://artificialanalysis.ai/models/comparisons/claude-4-sonnet-vs-gpt-4-1","content":"# Claude 4 Sonnet (Non-reasoning) vs GPT-4.1: Model Comparison\n\n[Stay connected with us on X, Discord, and LinkedIn to stay up to date with future analysis](h
- 13“Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"Evaluation: Claude 4 Sonnet vs OpenAI o4-mini vs Gemini 2.5 Pro","description":"Analyzing the difference in performance, cost and speed between the world's best reasoning models.","url":"https://www.vellum.ai/blog/evaluation-claude-4-sonnet-vs-openai-o4-mini-vs-gemini-2-5-pro","content":"Yesterday, Anthropic launched Claude 4 Opus and Sonnet. These models tighten things up, especially for teams building agents or working with large codebases. They're m
- 17“Anthropic launches Claude 4, its most powerful AI model yet - CNBC”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"Anthropic launches Claude 4, its most powerful AI model yet","description":"Anthropic, the Amazon-backed OpenAI rival, on Thursday launched its most powerful group of AI models yet: Claude 4.","url":"https://www.cnbc.com/2025/05/22/claude-4-opus-sonnet-anthropic.html","content":"# Amazon-backed Anthropic debuts Claude 4 Opus and Sonnet AI models\n\n[Skip Navigation](https://www.cnbc.com/2025/05/22/claude-4-opus-sonnet-anthropic.html#MainContent)\n\nNEW
- 18“Claude Opus 4 and Claude Sonnet 4 officially released - Reddit”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/ClaudeAI/comments/1ksv917/claude_opus_4_and_claude_sonnet_4_officially/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](htt
- 21“Claude by Anthropic - Models in Amazon Bedrock - AWS”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"Claude by Anthropic - Models in Amazon Bedrock – AWS","description":"Access Claude by Anthropic in Amazon Bedrock to build generative AI applications.","url":"https://aws.amazon.com/bedrock/anthropic/","content":"# Claude by Anthropic - Models in Amazon Bedrock – AWS\n\n## Select your cookie preferences\n\nWe use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous stat
- 22“Why does Claude has only 200k in context window and why are it's ...”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/ClaudeAI/comments/1kur7yr/why_does_claude_has_only_200k_in_context_window/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](
- 24“Anthropic's Claude Sonnet 4 in Amazon Bedrock Expanded Context ...”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"Anthropic’s Claude Sonnet 4 in Amazon Bedrock Expanded Context Window","description":"Discover more about what's new at AWS with Anthropic’s Claude Sonnet 4 in Amazon Bedrock Expanded Context Window","url":"https://aws.amazon.com/about-aws/whats-new/2025/08/anthropic-claude-sonnet-bedrock-expanded-context-window/","content":"# Anthropic’s Claude Sonnet 4 in Amazon Bedrock Expanded Context Window - AWS\n\n## Select your cookie preferences\n\nWe use esse
- 26“Claude.ai 4.0 Sonnet/Opus – more errors with projects lately? - Reddit”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/ClaudeCode/comments/1luxejd/claudeai_40_sonnetopus_more_errors_with_projects/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticke
- 28“Claude 4 and the Future of Software Development: What a 72.7 ...”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"title":"Claude 4 and the Future of Software Development: What a 72.7% SWE-bench Score Really Means","description":"Claude 4 and the Future of Software Development: What a 72.7% SWE-bench Score Really Means The AI landscape just shifted — again. Anthropic’s Claude 4 model recently hit a 72.7% score on the …","url":"https://medium.com/@senyuansamuelfan/claude-4-and-the-future-of-software-development-what-a-72-7-swe-bench-score-really-means-4b8f0a5a2fb4","content
- 29“Claude 4 Sonnet 80.2% SWE-Bench, SWE-Bench solved by the end ...”. Retrieved March 25, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/singularity/comments/1ksylxq/claude_4_sonnet_802_swebench_swebench_solved_by/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticke
