Claude Sonnet 3.5
Claude 3.5 Sonnet

Claude 3.5 Sonnet is a large language model developed by Anthropic, released as the first and mid-tier entry in the Claude 3.5 model family. Initially launched in June 2024, the model was designed to offer a balance between high-level reasoning and operational speed, functioning at twice the speed of the previous top-tier model, Claude 3 Opus 5. Despite its mid-tier positioning, the model demonstrated performance improvements over earlier flagship versions in several key domains, including coding, graduate-level reasoning, and visual processing 2, 5. In October 2024, Anthropic released an upgraded version of Claude 3.5 Sonnet alongside the introduction of Claude 3.5 Haiku, further enhancing its performance in software engineering and agentic workflows 2.
The model is characterized by significant advancements in technical capabilities, particularly in automated programming and complex reasoning. According to Anthropic, the October 2024 update improved the model's performance on the SWE-bench Verified benchmark—a metric for resolving real-world software issues—from 33.4% to 49.0% 2. This score was reported as higher than all other publicly available models at the time, including specialized reasoning systems 2, 5. In addition to text-based tasks, the model features multimodal capabilities that allow it to interpret and extract data from visual inputs such as charts, diagrams, and screenshots 5. These visual reasoning skills are utilized in the model's 'Artifacts' UI feature, which provides a dedicated window for users to view, edit, and iterate on code, documents, and website designs in real-time alongside the chat interface.
A central feature introduced with the October 2024 iteration is 'computer use,' a capability currently in public beta that allows the model to interact with standard computer interfaces 2. Rather than relying on specific tool-based APIs, the model is trained to perceive a computer screen and execute actions such as moving a cursor, clicking buttons, and typing text 2, 5. Developers can integrate this capability via the Anthropic API to automate multi-step workflows. In OSWorld benchmarks, which evaluate an AI's ability to use computers like a human, Claude 3.5 Sonnet scored 14.9%, which Anthropic asserts is notably higher than the next-best AI system in the same category 2. However, the developer notes that this capability remains experimental and can be prone to errors in complex actions like scrolling or zooming 2, 5.
Claude 3.5 Sonnet is deployed through various platforms, including Anthropic's web interface, its first-party API, Amazon Bedrock, and Google Cloud’s Vertex AI 2, 5. Regarding safety and governance, Anthropic states that the model underwent joint pre-deployment testing by the United States and United Kingdom AI Safety Institutes 2. The model is categorized under the ASL-2 (AI Safety Level 2) standard as defined by Anthropic’s Responsible Scaling Policy, indicating that it does not currently present catastrophic risk potential according to the developer's internal evaluations 2. External partners such as GitLab, Replit, and The Browser Company have integrated the model to power autonomous coding agents and web-based workflow automation 2.
Background
Claude 3.5 Sonnet was developed as the first release in Anthropic's 3.5 model generation, following the Claude 3 family introduced in early 2024 5. The Claude 3 line established a three-tier model hierarchy: Haiku for high-speed tasks, Sonnet for a balance of speed and intelligence, and Opus as the high-intelligence flagship 5. In this structure, the Sonnet tier was designed to provide mid-range performance suitable for enterprise applications that required more reasoning capability than the entry-level model but lower latency than the flagship 5, 6.
Anthropic released Claude 3.5 Sonnet in June 2024, aiming to shift the performance-to-cost ratio of the mid-tier category 5. Upon its launch, the developer stated that the model operated at twice the speed of Claude 3 Opus while surpassing it on several industry benchmarks 5. According to third-party reports from Google Cloud, the model established new benchmarks for reasoning, undergraduate-level knowledge, and coding proficiency at the time of its release 6. This development occurred during a period of high competition in the AI sector, where the model faced market pressure from OpenAI's GPT-4o and Google's Gemini 1.5 Pro 6. Enterprises were increasingly prioritizing models that offered sophisticated reasoning alongside visual processing and multi-step workflow orchestration 6.
In October 2024, Anthropic released an updated version of the model, referred to as the upgraded Claude 3.5 Sonnet 5. This iteration maintained the same pricing and speed as the original 3.5 version but improved performance in software engineering tasks, as measured by benchmarks such as SWE-bench Verified 5. The update also introduced "computer use," a public beta capability allowing the model to interact with computer interfaces by perceiving screen content and executing mouse and keyboard actions 5. This feature was aimed at enabling agentic workflows, such as automated software testing and back-office task execution, where the model must navigate standard tools and software programs like a human user 5.
Architecture
Claude 3.5 Sonnet is built upon a transformer-based architecture, following the standard design principles of the Claude 3 model family 9. As an evolutionary update rather than a complete architectural departure, the model is designed to optimize the trade-off between computational efficiency and cognitive performance 8. While its developer, Anthropic, has not officially disclosed the exact parameter count, third-party analysis by Microsoft researchers has estimated that Claude 3.5 Sonnet operates with approximately 175 billion parameters 7. The model is characterized by its mid-tier positioning within the 3.5 family, intended to provide greater intelligence than the high-speed Haiku tier while maintaining faster operational speeds than the high-intelligence Opus tier 1, 5.
Context and Processing Capabilities
The model features a 200,000-token context window, allowing it to process and maintain information from extensive documents, large code repositories, or complex multi-turn conversations 1. This capacity is supported by an architecture optimized for speed; Anthropic states that the model operates at twice the speed of the previous top-tier model, Claude 3 Opus 1. This performance gain is intended to facilitate complex, multi-step workflows such as context-sensitive customer support and agentic coding 6. In internal evaluations of agentic coding, the model solved 64% of problems, which the developer attributes to improved reasoning and troubleshooting capabilities within its software engineering loop 1, 8.
Multimodal Architecture
Claude 3.5 Sonnet integrates vision and language processing within a single unified model 1. Unlike earlier systems that relied on separate modules for different data types, this native multimodality allows the model to interpret visual data—such as charts, graphs, and technical illustrations—directly alongside text-based prompts 9. The model supports image uploads in formats including JPEG, PNG, GIF, and WebP, with a maximum file size of 10MB per image 9. Third-party cloud providers, such as Google Cloud, have noted that this architecture is particularly effective for transcribing text from imperfect images and deriving insights from unstructured visual data in sectors like retail and logistics 6.
Training and Alignment Methodology
The training of Claude 3.5 Sonnet involved a combination of unsupervised learning on large-scale datasets and specialized alignment techniques 9. The knowledge cutoff for the model family is August 2023 9. Anthropic utilizes a proprietary methodology known as Constitutional AI, which aligns the model's outputs with a specific set of principles designed to ensure the system is helpful, honest, and harmless 9. This process involves both Reinforcement Learning from Human Feedback (RLHF) and Reinforcement Learning from AI Feedback (RLAIF), where a supervised model helps train the primary model to adhere to the established "constitution" 9.
Safety is integrated into the architecture through rigorous red-teaming and testing 1. The model has been classified as attaining an AI Safety Level 2 (ASL-2), indicating it does not present a catastrophic risk at its current scale 1. Before its wide release, the model's safety mechanisms were evaluated by external bodies, including the UK’s Artificial Intelligence Safety Institute (UK AISI) 1.
Technical Infrastructure
The model was developed using high-performance computing hardware provided by Amazon Web Services (AWS) and Google Cloud Platform (GCP) 9. The core software frameworks utilized in its training and deployment include PyTorch, JAX, and Triton 9. To enhance reliability for enterprise users, the model is deployed across various cloud environments, utilizing cross-region inference to minimize latency and ensure availability 4.
Capabilities & Limitations
Claude 3.5 Sonnet is a multimodal model designed for complex reasoning, mathematical problem-solving, and professional-level coding tasks 6. Anthropic states that the model demonstrates performance improvements over its predecessor, Claude 3 Opus, particularly in its ability to follow instructions and execute multi-step workflows 6.
Coding and Technical Tasks
The model is characterized by its capability to write, edit, and troubleshoot code independently when provided with appropriate tool access 6. In software engineering benchmarks, an upgraded version of Claude 3.5 Sonnet released in October 2024 showed an increase in performance on the SWE-bench Verified task, rising from 33% to 49% 5. According to Anthropic, this level of performance exceeds that of other publicly available models at the time of testing 5. Practical applications for these capabilities include the migration of legacy codebases to modern frameworks and the generation of functional software from high-level specifications 6. The model also showed improvements in agentic tool-use tasks, such as those measured by the TAU-bench, where it achieved 69.2% in the retail domain and 46.0% in the airline domain 5.
Visual Reasoning and Data Analysis
Claude 3.5 Sonnet supports visual input, allowing it to interpret and process data from charts, graphs, and technical illustrations 6. In enterprise settings, the model is used to extract structured data from visual sources, such as transcribing information from a PNG chart into JSON format 5. Anthropic asserts that the model can navigate unstructured data to generate statistical visualizations and actionable predictions, which has been applied in healthcare and education for the production of research summaries 6. The model is also capable of interpreting "imperfect images," such as those found in retail and logistics environments where visual data may be degraded 6.
Computer Use API
A significant feature introduced in public beta for the upgraded Claude 3.5 Sonnet is "computer use," which allows the model to interact with standard desktop interfaces 5. Rather than being limited to specific APIs, the model is trained to perceive and operate computer environments as a human user would 5. This is achieved through three primary tools:
- Computer tool: Receives screenshots and goals to return mouse movements, clicks, and keyboard actions 5.
- Text editor tool: Allows the model to view, create, and edit files, as well as undo changes 5.
- Bash tool: Enables the model to execute shell commands in a terminal for low-level system interaction 5.
In the OSWorld benchmark for multimodal agents, the model achieved a score of 14.9%, which Anthropic notes is higher than the 7.7% achieved by the next-best model in the same category, though still significantly below the human-level performance range of 70–75% 5.
Limitations and Constraints
Despite its technical capabilities, Claude 3.5 Sonnet has several known limitations and failure modes. In the context of computer use, the model frequently struggles with specific UI interactions such as scrolling, dragging, and zooming 5. Because the technology is in its early stages, developers are encouraged by Anthropic to test the model in sandbox environments and apply it only to lower-risk tasks 5.
General limitations common to large language models remain, including the potential for hallucinations where the model may generate factually incorrect information while appearing confident. While the model can process and analyze complex visual data, it may encounter difficulties with precise spatial reasoning or extremely fine-grained visual details. Furthermore, the model does not possess native real-time web browsing capabilities unless integrated through external search tools or APIs 6.
Performance
Claude 3.5 Sonnet is characterized by performance levels that exceed those of its predecessor, Claude 3 Opus, across several standardized benchmarks despite being positioned as a mid-tier model 1. Upon its initial release in June 2024, Anthropic reported that the model surpassed Claude 3 Opus and competitor models like GPT-4o in areas such as graduate-level reasoning (GPQA), undergraduate-level knowledge (MMLU), and coding proficiency (HumanEval) 1. In internal agentic coding evaluations, the model successfully solved 64% of problems, compared to 38% for Claude 3 Opus, demonstrating a higher capacity for fixing bugs and adding functionality to codebases 1.
In October 2024, Anthropic released an upgraded version of Claude 3.5 Sonnet that further increased its performance on technical tasks 2. On the SWE-bench Verified evaluation, which measures the ability to resolve real-world software issues, the upgraded model's performance improved from 33.4% to 49.0% 2. Anthropic stated that this score was higher than all publicly available models at the time of the update, including specialized reasoning models like OpenAI o1-preview 2. Additionally, the model showed improvements on the TAU-bench for agentic tool use, rising from 62.6% to 69.2% in the retail domain and from 36.0% to 46.0% in the airline domain 2. On the OSWorld benchmark, which evaluates computer navigation capabilities, the model achieved a score of 14.9% in the screenshot-only category, which Anthropic noted was nearly double the score of the next-best system, which scored 7.8% 2.
Regarding operational efficiency, Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus 1. Third-party testing by GitLab indicated that the October 2024 update delivered up to 10% stronger reasoning across various use cases without increasing latency 2. The model maintains a 200,000-token context window and is priced at $3 per million input tokens and $15 per million output tokens 1. This pricing structure remained consistent after the October 2024 upgrade, aiming to offer higher intelligence at the same cost as the initial June release 25. In independent assessments, such as the LMSYS Chatbot Arena, the model has frequently held top rankings in Elo ratings, competing with other frontier models for the highest score 12.
Safety & Ethics
Anthropic classifies Claude 3.5 Sonnet as maintaining an AI Safety Level 2 (ASL-2) rating under its Responsible Scaling Policy 1. According to the developer, this classification indicates that the model, despite its increased reasoning capabilities, does not currently meet the threshold for high-level catastrophic risks—such as the autonomous creation of biological weapons—which would require an ASL-3 designation 12.
External Safety Evaluations
Prior to its public deployment, Claude 3.5 Sonnet underwent testing by national security and safety organizations. The model was provided to the United Kingdom’s Artificial Intelligence Safety Institute (UK AISI) for pre-deployment evaluation 1. Results from these assessments were shared with the United States AI Safety Institute (US AISI) as part of a formal partnership between the two nations aimed at standardized safety testing for frontier models 12. Anthropic states that these external engagements are designed to verify the effectiveness of internal safety guardrails against various types of misuse 1.
Alignment and Content Filtering
The model is developed using Constitutional AI, a training methodology where the AI is aligned to a set of principles designed to ensure outputs are helpful and harmless 1. As part of this process, Anthropic integrated feedback from external subject matter experts to refine its safety classifiers. This included collaboration with child safety organization Thorn to update classifiers and fine-tune the model against the generation of child sexual abuse material (CSAM) and other child safety risks 1. Additionally, the developer asserts that it follows a policy of not training its generative models on user-submitted data without explicit permission, framing privacy as a core constitutional principle 1.
Computer Use Risks and Mitigation
The October 2024 update to Claude 3.5 Sonnet introduced "computer use," a capability allowing the model to interact with user interfaces by moving cursors and executing keystrokes 25. Anthropic identified that this feature could provide new vectors for harm, specifically in the areas of fraud, spam, and misinformation 2. To mitigate these risks, the developer implemented specialized classifiers to detect and intercept harmful actions during computer-mediated tasks 2. For initial deployment, the developer has advised users to employ the feature in sandbox environments and for low-risk tasks due to its experimental and occasionally error-prone nature 25.
Applications
Claude 3.5 Sonnet is applied across diverse professional sectors, primarily for tasks requiring a balance of high-level reasoning and operational speed. According to Anthropic, the model’s performance profile makes it suitable for orchestrating multi-step workflows and managing context-sensitive customer support automation 1.
Software Development and Agentic Tools
The model is utilized as a core reasoning engine in agentic coding platforms, including the Replit Agent and Cognition’s Devin 5. Anthropic states that the model can independently write, edit, and execute code when provided with appropriate tools, which facilitates the modernization of legacy applications and the migration of codebases 1. In general software development lifecycles, the upgraded version of the model is used for tasks ranging from initial system design to bug fixes, maintenance, and performance optimizations 5. Case studies in the fintech industry have reported significant improvements in engineering velocity; for instance, the firm CRED indicated that the deployment of the model contributed to a two-fold increase in developer output 2.
Enterprise Automation and Data Analysis
In enterprise environments, the model is used to automate data-intensive operations and complex back-office tasks. The telecommunications provider TELUS integrated the model into its "Fuel iX" platform, which reportedly enabled the creation of over 13,000 internal AI tools and reduced software release times by 30% 2. In the financial sector, the firm Brex utilized the model via Amazon Bedrock to automate expense compliance, with the company reporting that 75% of transactions were auto-processed through the system 2.
The model’s vision capabilities are frequently applied to extract quantitative data from charts, diagrams, and financial documents 5. S&P Global’s Kensho division ranked the model at the top of its business and finance benchmarks, noting its proficiency in quantity extraction and domain-specific reasoning 4. Furthermore, the model has been integrated into Snowflake’s Cortex AI to support text-to-SQL queries, achieving reported accuracy rates exceeding 90% for enterprise data analytics 2.
Computer Interaction and Collaborative Workspaces
A public beta feature known as "computer use" allows developers to direct the model to interact with standard computer interfaces. This capability enables the model to perceive screen content, move cursors, click buttons, and type text, which is intended for automating software testing and repetitive desktop workflows 5. For collaborative content creation, the "Artifacts" workspace on Claude.ai provides a side-by-side environment where users can generate and edit code snippets, website designs, and text documents in real-time, functioning as a shared space for team-based projects 1.
Reception & Impact
Claude 3.5 Sonnet received significant industry attention for its ability to outperform existing flagship models despite its mid-tier positioning 13. Anthropic states that the model demonstrates an exceptional capacity for nuance, humor, and complex instruction following, producing high-quality content with a natural, relatable tone 1. This writing style, combined with high-level reasoning, led to widespread adoption in creative and professional workflows 57. The model's "Sonnet-first" release strategy—launching the mid-tier entry before the 3.5 generation's flagship—was described as a tactical shift that elevated industry expectations for performance and cost-efficiency in non-flagship models 16.
Developer preference reportedly shifted toward Anthropic following the model's release, particularly for technical and coding tasks 6. According to internal and third-party evaluations, Claude 3.5 Sonnet achieved high results on coding benchmarks like HumanEval and SWE-bench Verified 14. Anthropic reported that an upgraded version of the model solved 49% of problems on SWE-bench Verified, outperforming competitor models including OpenAI's o1-preview 2. This technical proficiency led to its integration into software engineering platforms such as GitLab, Cognition, and Replit 2. Industry analysts estimated that Anthropic's enterprise market share doubled in 2024, a growth largely attributed to the popularity of the 3.5 Sonnet model among AI engineers 6.
The introduction of the "Artifacts" feature was noted for transforming the user experience from a simple chat interface into a collaborative workspace 17. This capability enables users to generate, view, and iterate upon code, documents, and website designs in real-time within the application 1. Subsequent media coverage highlighted the model's ability to rapidly create functional web applications and interactive 3D visualizations through this interface 7.
In October 2024, the release of the "computer use" capability in public beta marked a transition toward agentic AI applications 25. This feature allows the model to interact with computer interfaces by moving cursors, clicking buttons, and typing text as a human would 2. While Anthropic described the capability as a way to automate complex multi-step workflows, they also acknowledged it as "experimental" and at times "cumbersome" 2. Media and safety experts focused on both the potential for automation and the associated risks of the feature, such as vulnerability to prompt injection or fraud 2. To address these concerns, the model underwent pre-deployment testing by the US and UK AI Safety Institutes 12.
Version History
Claude 3.5 Sonnet was initially released on June 20, 2024, as the first model in Anthropic's 3.5 generation 5. At launch, the model was positioned as a mid-tier offering that outperformed the previous flagship, Claude 3 Opus, in reasoning and speed while maintaining a lower price point 5.
On October 22, 2024, Anthropic released an updated version of the model, characterized as the "upgraded" Claude 3.5 Sonnet 2. This version introduced a "computer use" capability in public beta, allowing developers to direct the model to interact with computer interfaces by perceiving screens and executing commands such as mouse clicks and typing 25. Anthropic reported that the upgraded model showed significant improvement in coding tasks, with its performance on the SWE-bench Verified benchmark increasing from 33.4% to 49.0% 2. On the OSWorld benchmark for computer navigation, the model achieved a score of 14.9%, which the developer stated was notably higher than the next-best system's score of 7.8% at the time 25.
The October 2024 update was part of a broader rollout of the Claude 3.5 family, which included the announcement of Claude 3.5 Haiku 2. The Haiku model, designed for high-speed tasks, became available on platforms such as Amazon Bedrock on November 4, 2024 5. Anthropic subsequently revised the pricing for Claude 3.5 Haiku in early December 2024 2. Following these releases, the high-intelligence Claude 3.5 Opus was designated as a future addition to the model line 2.
Sources
- 1“Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku”. Retrieved March 25, 2026.
Today, we’re announcing an upgraded Claude 3.5 Sonnet, and a new model, Claude 3.5 Haiku. The upgraded Claude 3.5 Sonnet delivers across-the-board improvements over its predecessor, with particularly significant gains in coding—an area where it already led the field. ... We’re also introducing a groundbreaking new capability in public beta: computer use.
- 2“Announcing three new capabilities for the Claude 3.5 model family in Amazon Bedrock | Amazon Web Services”. Retrieved March 25, 2026.
Four months ago, we introduced Anthropic’s Claude 3.5 in Amazon Bedrock, raising the industry bar for AI model intelligence while maintaining the speed and cost of Claude 3 Sonnet. ... Claude 3.5 Sonnet now offers computer use capabilities in Amazon Bedrock in public beta, allowing Claude to perceive and interact with computer interfaces.
- 3“Announcing Anthropic’s Claude 3.5 Sonnet on Vertex AI, providing more choice for enterprises”. Retrieved March 25, 2026.
Claude 3.5 Sonnet outperforms Anthropic’s previous most intelligent model, Claude 3 Opus, on a wide range of Anthropic’s evaluations. ... In Anthropic’s evaluations, Claude 3.5 Sonnet set new benchmarks for reasoning, undergraduate-level knowledge, math, and coding.
- 4“Introducing Claude 3.5 Sonnet”. Retrieved March 25, 2026.
The model costs $3 per million input tokens and $15 per million output tokens, with a 200K token context window. ... Claude 3.5 Sonnet operates at twice the speed of Claude 3 Opus. ... Claude 3.5 Sonnet is our strongest vision model yet.
- 5“Claude 3.5 Sonnet on GitHub Copilot”. Retrieved March 25, 2026.
With context about your entire codebase, you can use Claude 3.5 Sonnet on GitHub Copilot... Claude 3.5 Sonnet runs on GitHub Copilot via Amazon Bedrock, leveraging Bedrock’s cross-region inference.
- 6“The Claude 3 Model Family: Opus, Sonnet, Haiku”. Retrieved March 25, 2026.
Claude 3 models employ various training methods, such as unsupervised learning and Constitutional AI... trained using hardware from Amazon Web Services (AWS) and Google Cloud Platform (GCP)... core frameworks including PyTorch, JAX, and Triton... knowledge cutoff for the Claude 3 models is August 2023.
- 7“The Number of Parameters of GPT-4o and Claude 3.5 Sonnet”. Retrieved March 25, 2026.
According to Figure 2, we can see that... Claude 3.5 Sonnet operates with roughly 175 billion parameters.
- 8“Claude 3.5 Sonnet Model Card Addendum”. Retrieved March 25, 2026.
This addendum to our Claude 3 Model Card describes Claude 3.5 Sonnet... Since it is an evolution of the Claude 3 model family, we are providing an addendum rather than a new model card.
- 9“Claude in enterprise: case studies of successful AI deployments”. Retrieved March 25, 2026.
TELUS integrated into “Fuel iX” platform... 13,000+ AI tools built internally, 500,000+ hours saved, 30% faster software releases. ... Brex: 75% of transactions auto-processed. ... Snowflake: >90% accuracy on text-to-SQL queries. ... CRED: 2× developer velocity.
