Claude Haiku 3.5
Claude 3.5 Haiku is a large language model (LLM) developed by Anthropic and released on November 4, 2024, following an initial announcement on October 22, 2024 12. As the entry-level model in the Claude 3.5 family, it is designed for maximum speed and efficiency, catering to applications that require low latency and high interactivity 1. The model features a context window of 200,000 tokens and a maximum output limit of 8,192 tokens per request 1. Its knowledge cutoff is dated July 31, 2024 1. Unlike the larger models in the Claude 3.5 series, Claude 3.5 Haiku is limited to text-based processing and does not support image inputs 2.
Anthropic positions the model as a significant improvement over its predecessor, Claude 3 Haiku, stating that it surpasses the performance of Claude 3 Opus—the largest model from the previous generation—on numerous industry benchmarks 2. The model is engineered for real-time tasks such as customer service chatbots, data management, and providing immediate coding suggestions 1. Pricing for the model via the Anthropic API is set at $0.80 per million input tokens and $4.00 per million output tokens, which reflects a cost-efficiency strategy intended for scaled enterprise deployments and high-volume tasks like content moderation or data extraction 12.
In independent evaluations and benchmark testing, the model has demonstrated varying degrees of capability across different domains. According to data from Artificial Analysis, Claude 3.5 Haiku achieved a score of 40.8% on the GPQA Diamond benchmark, which evaluates graduate-level scientific reasoning 1. On the IFBench instruction-following benchmark, the model recorded a score of 42.8% 1. Its coding capabilities were measured at 27.4% on the SciCode benchmark for scientific computing, though it scored lower on more complex agentic tasks, such as the Terminal-Bench Hard, where it achieved a 2.3% 1. Overall, the model is characterized by an intelligence index that ranks it better than approximately 41% of models compared by Artificial Analysis at the time of its release 1.
Claude 3.5 Haiku is accessible through the Claude API, as well as major cloud infrastructure providers including Amazon Bedrock and Google Cloud's Vertex AI 12. Third-party performance monitoring on Amazon Bedrock has indicated an average throughput of 48 tokens per second with an average latency of approximately 0.66 seconds 1. While it initially represented the fastest offering in Anthropic's lineup, the company announced the subsequent Claude Haiku 4.5 in late 2025, which was designed to match the performance of the mid-tier Sonnet 4 model while maintaining the Haiku series' emphasis on speed and affordability 2.
Background
Claude 3.5 Haiku was developed as part of a staggered update to Anthropic's Claude 3.5 model family, succeeding the initial Claude 3 series introduced in March 2024 910. The predecessor family, Claude 3, established a three-tier model structure categorized by capability and resource requirements: Haiku for speed and cost-efficiency, Sonnet for balanced performance, and Opus for high-complexity reasoning 9. Within this framework, the original Claude 3 Haiku was designed for high-volume, low-latency tasks such as content moderation and rapid data extraction, capable of processing information-dense documents in under three seconds 9.
By late 2024, the artificial intelligence sector experienced a trend where developers sought to bridge the gap between compact, efficient models and the reasoning capabilities of larger "frontier" models. Anthropic’s development of the 3.5 cycle reflected this shift, aiming to provide significantly higher intelligence at the same speed and cost as previous iterations 10. Claude 3.5 Haiku was specifically designed to replace the original Claude 3 Haiku by offering improved reasoning and coding performance while maintaining the compact architecture required for near-instant responses 10.
On October 22, 2024, Anthropic announced Claude 3.5 Haiku alongside an upgraded version of Claude 3.5 Sonnet 10. The motivation for the 3.5 Haiku update was to provide a model that could match the performance of the previous generation's most capable model, Claude 3 Opus, across several benchmarks 10. Anthropic stated that this leap in capability was intended to support more advanced software engineering tasks, such as fast code suggestions and the categorization of large datasets in fields like finance and healthcare 10.
The model's development also focused on refining the "responsible design" principles established during the Claude 3 era, which included maintaining a 200,000-token context window and improving the model's ability to follow complex, multi-step instructions without the frequent refusals seen in earlier generations 910. Following its initial announcement, Claude 3.5 Haiku became generally available on November 4, 2024, through the Anthropic API and cloud provider platforms like Amazon Bedrock 10.
Architecture
Claude 3.5 Haiku is a large language model (LLM) architected as the high-speed, compact entry in the Claude 3.5 series 1. Like other models in the Anthropic lineup, its internal parameter count and specific structural configuration remain undisclosed by the developer 4. The model is engineered primarily for low-latency performance and high throughput, positioning it for tasks that require near-instantaneous responsiveness, such as real-time customer service interactions and live coding suggestions 1.
Context and Token Management
The model features a context window capacity of 200,000 tokens, allowing it to process and maintain information from extensive datasets, long-form documents, or large repositories of source code in a single prompt 1. This context capacity matches the larger models in the Claude 3.5 and Claude 3 families 1. For generation, Claude 3.5 Haiku is limited to a maximum output of 8,192 tokens per individual request 1.
In terms of data currency, the model’s knowledge cutoff is dated July 31, 2024 1. This represents a significant advancement over its predecessor, Claude 3 Haiku, providing a more recent baseline for factual queries and technical documentation 1.
Technical Constraints and Input Modalities
Unlike the more capable Claude 3.5 Sonnet or the previous Claude 3 Haiku, Claude 3.5 Haiku does not support multimodal image inputs at launch 1. It is strictly a text-to-text model, optimized for processing and generating linguistic and symbolic data 1. Anthropic states that this focus allows the model to maintain higher speeds and lower operational costs while excelling in specialized text-based tasks 1.
Training Focus and Optimization
Anthropic states that the training methodology for Claude 3.5 Haiku prioritized improvements in reasoning, coding accuracy, and tool-use proficiency 1. According to the developer, these optimizations allow the model to outperform the previous generation's flagship model, Claude 3 Opus, on several industry benchmarks while maintaining the latency profile of a much smaller model 1.
The model’s architecture is specifically tuned for "agentic" workflows, where the LLM must interact with external software tools or APIs 1. Benchmarks sourced from Artificial Analysis indicate an agentic capability score of 17.7, which is characterized as performing better than 44% of compared models in its class 1. Furthermore, its coding capability score of 10.7 reflects an emphasis on Python programming and scientific computing tasks 1.
Inference and Performance Metrics
Architectural efficiency in Claude 3.5 Haiku is reflected in its inference speeds. Third-party testing on platforms such as Amazon Bedrock has recorded an average throughput of approximately 48 tokens per second, with a time-to-first-token (TTFT) latency of roughly 0.66 seconds 1. This performance profile is intended to support high-volume data management systems and automated content moderation where cost-per-token and speed are primary constraints 1.
Capabilities & Limitations
Claude 3.5 Haiku is characterized by its developer as an entry-level model optimized for low-latency performance and high-speed execution 1. Anthropic states that the model delivers significant improvements in reasoning, coding accuracy, and tool use compared to the previous generation Claude 3 Haiku 14.
Coding and Technical Performance
In technical evaluations, the model has demonstrated proficiency in software development and scientific reasoning. According to developer-reported benchmarks and third-party tracking, Claude 3.5 Haiku achieved a score of 40.60% on the SWE-bench Verified benchmark, which measures the ability of a model to resolve real-world software issues 4. On HumanEval, a benchmark for synthesizing programs from docstrings, the model recorded a score of 0.88 7. In domain-specific assessments, the model achieved an 84% overall accuracy on a set of 570 multiple-choice questions regarding Magnetic Resonance Imaging (MRI) physics 6. This performance matched GPT-4 Turbo in aggregate, though the model exhibited variance across sub-topics, scoring 94% in basic principles but declining to 78% in image weighting and contrast 6.
Instruction Following and Agentic Workflows
The model is engineered for agentic tasks, where a language model acts as an autonomous or semi-autonomous intermediary using external tools 1. Artificial Analysis reported a composite agentic capability score for the model that ranks it higher than 44% of other compared models 1. In terms of specific instruction adherence, it recorded a score of 42.8% on the IFBench benchmark 1. These capabilities are intended to support real-time applications such as interactive chatbots, automated data extraction, and immediate coding suggestions 14. The model also achieved a 51% success rate on the TAU-bench Retail benchmark, which evaluates a model's ability to handle multi-turn conversations and policy guidelines in a retail environment 7.
Modalities and Known Limitations
Unlike the more capable Claude 3.5 Sonnet, the initial release of Claude 3.5 Haiku is restricted in its input modalities. While its predecessor and larger family members support vision-based tasks, technical documentation indicates that Claude 3.5 Haiku does not support image inputs 1. The model is strictly text-and-code-based, though some third-party platforms have reported varying levels of support for processing code snippets and screenshots via integrated wrappers 4.
Reliability remains a notable limitation for the model. Evaluations of its factual accuracy, known as "Omniscience Accuracy," placed the model at 13.0%, meaning it correctly answered a small proportion of high-difficulty questions 1. Among the questions it did not answer correctly, the model recorded a hallucination rate of 58.5% 1. On the "Humanity's Last Exam" (HLE) benchmark, which tests for graduate-level reasoning and expert knowledge, the model scored 3.5% 1. Additionally, while the model maintains a large 200,000-token context window for inputs, its output is capped at 8,192 tokens per request, limiting its use for generating very long-form documents in a single pass 14.
Performance
Standardized Benchmarks
In standardized evaluations, Claude 3.5 Haiku demonstrates improvements in reasoning and technical proficiency compared to its predecessor. On the GPQA Diamond benchmark, which measures graduate-level scientific reasoning, the model achieved a score of 40.8% 1. In the SciCode benchmark, which evaluates Python programming for scientific computing, it recorded a score of 27.4% 1. Other performance metrics include a 42.8% score on the IFBench instruction-following benchmark and 24.6% on the τ²-Bench, which tests conversational AI agents in dual-control scenarios 1.
Third-party analysis by Artificial Analysis assigned the model an overall Intelligence Index of 18.7, placing it higher than 41% of models in its performance class 14. The model also recorded an Agentic Index of 17.7 and a Coding Index of 10.7 1. In knowledge-based testing, the model reached a 13.0% accuracy rate on the AA-Omniscience benchmark, though it exhibited a hallucination rate of 58.5% among incorrect responses in that specific evaluation 1.
Latency and Throughput
Claude 3.5 Haiku is engineered for low-latency tasks, and its real-world performance varies by infrastructure provider. According to metrics from OpenRouter, the model achieves an average throughput of 43 to 48 tokens per second when accessed via Amazon Bedrock (US-WEST) 1. Standard latency for this provider averages 0.72 seconds, with an end-to-end (E2E) latency of approximately 3.62 seconds 1. Other provider configurations show a slightly lower average throughput of 38 tokens per second and a higher average latency of 0.89 seconds 1. The model's reliability in tool use has been measured with a tool call error rate of approximately 11.52% on optimized endpoints, though this rate increases to 34.91% on standard configurations 1.
Cost Efficiency and Comparative Standing
Anthropic has positioned Claude 3.5 Haiku as a high-speed, cost-efficient model for large-scale deployments. The standard API pricing is set at $0.80 per million input tokens and $4.00 per million output tokens 14. To further reduce operational costs, the model supports prompt caching, which allows for cache read operations at $0.08 per million tokens and cache write operations at $1.00 per million tokens 1. Due to these caching features, the effective weighted average input price has been observed as low as $0.627 per million tokens in some production environments 1.
In the broader landscape of AI models, Artificial Analysis ranks Claude 3.5 Haiku 12th out of 63 models in its class for input affordability and 19th for output affordability 4. While it sits in the middle of the intelligence rankings for its tier—ranking 35th out of 63—it is frequently compared to other small-format, high-speed models such as GPT-4o mini and Gemini 1.5 Flash based on its pricing structure and intended use in real-time applications like customer service bots and data management systems 14.
Safety & Ethics
Claude 3.5 Haiku is developed using Anthropic's proprietary safety framework, which relies primarily on Constitutional AI 12. This method involves providing the model with a set of written principles—a "constitution"—to guide its behavior, supplemented by Reinforcement Learning from Human Feedback (RLHF) to align its outputs with human values and safety standards 12. Anthropic states that this dual approach allows the model to follow instructions while minimizing the generation of harmful, biased, or sexually explicit content 412.
Under Anthropic’s Responsible Scaling Policy, Claude 3.5 Haiku is categorized as AI Safety Level 2 (ASL-2) 12. This classification indicates that while the model demonstrates high reasoning capabilities, it does not currently present a foundational risk for catastrophic misuse, such as aiding in the creation of biological weapons or conducting autonomous cyberattacks 12. Independent pre-deployment assessments of the Claude 3.5 family were conducted by the Model Evaluation and Threat Research (METR) organization to verify these safety thresholds 12.
A primary security concern identified with the Claude 3.5 generation is vulnerability to prompt injection attacks 12. In these scenarios, malicious instructions embedded in third-party content can override the model's original system prompts, potentially leading it to perform unintended actions 12. Third-party red-teaming demonstrated that when models in this family are integrated into agentic workflows, prompt injections can be used to execute unauthorized system commands or compromise sensitive data, such as SSH keys 12. Anthropic acknowledges that as models gain more "general computer skills," the severity of prompt injection risks increases, requiring developers to implement external guardrails at the application level 12.
Ethical evaluations by independent researchers have noted shifts in the model's behavioral profile compared to the Claude 3 series. Observations from the AI research community indicate that Claude 3.5 models may be less prone to over-refusal—the tendency to decline benign requests due to overly sensitive safety filters—than their predecessors 12. However, some critics and testers have characterized the 3.5 series as appearing more "neurotypical" or "flatter" in personality, suggesting that the refinement process may have ablated certain unique traits found in the original Claude 3 Opus or Sonnet models 12. Additionally, despite safety training, the model remains susceptible to "mode collapse," where it may adopt a repetitive or less sophisticated reasoning style under specific prompting conditions 12.
Applications
Claude 3.5 Haiku is primarily utilized in environments requiring a balance of speed, cost-efficiency, and tool-use capabilities. Anthropic positions the model as a solution for real-time applications and high-volume tasks where low latency is a primary operational requirement 12.
Autonomous Agents and Multi-Agent Systems
The model is integrated into the development and execution of autonomous AI agents. Third-party applications including OpenClaw and Agent Zero leverage the model's tool-calling efficiency to perform automated tasks 1. In these contexts, the model often functions as a sub-agent within larger multi-agent systems, handling specific components of complex workflows such as software refactors or feature builds 2. Its 200,000-token context window enables it to process substantial datasets during these agentic cycles, which Anthropic asserts allows it to maintain quality and speed in scaled deployments 12.
Customer Service and Interactive Chat
Due to its low-latency design, the model is frequently used for user-facing applications such as customer service bots and interactive chatbots 1. Anthropic states that the model's responsiveness makes it suitable for latency-sensitive experiences where immediate feedback is necessary 2. It is also employed in high-volume communication platforms, including the Pitchprfct SMS platform and the Lessie cold email generator, where rapid text generation is required to maintain high throughput 1.
Software Development and Data Management
In software engineering, Claude 3.5 Haiku provides real-time coding suggestions and assists in on-the-fly code completion 1. According to Anthropic, the model's improved coding accuracy compared to previous generations allows for more reliable integration into development environments requiring quick iterations 1. For data management, the model is used for large-scale document summarization, data extraction, and real-time content moderation 1. Its pricing structure—set at $0.80 per million input tokens—is intended to make the model practical for powering the free tiers of AI products or other budget-conscious applications 12.
Ideal Scenarios and Limitations
The model is most effective in scenarios involving high interactivity and high-throughput data processing tasks 1. While Anthropic reports that Claude 3.5 Haiku surpasses the previous generation's flagship model, Claude 3 Opus, on certain intelligence benchmarks, it is generally positioned as an entry-level model 2. It is not recommended for the most complex reasoning tasks where larger models in the Claude 3.5 family, such as Sonnet or Opus, would provide greater depth 2. Additionally, the model does not support image inputs, which precludes its use in multimodal or vision-based applications 1.
Reception & Impact
The reception of Claude 3.5 Haiku has been characterized by a dual focus on its performance improvements and a significant shift in its economic positioning within the small-model market. Upon its release, industry analysts noted that the model represented a departure from the ultra-low-cost strategy established by its predecessor, Claude 3 Haiku 19.
Pricing and Market Positioning
A primary point of discussion among developers and industry observers has been the model's pricing structure. Claude 3.5 Haiku was launched with a price of $0.80 per million input tokens and $4.00 per million output tokens 1. This represents an approximate 220% increase in input costs and a 220% increase in output costs compared to the original Claude 3 Haiku, which was priced at $0.25 per million input and $1.25 per million output tokens 9.
Anthropic has framed this price increase by asserting that the model delivers performance parity with Claude 3 Opus, their previous flagship model, while maintaining significantly higher speeds and lower latency 19. However, technical commentators have observed that this pricing places the model in a distinct category compared to other "mini" models, such as GPT-4o-mini or Gemini 1.5 Flash, which generally maintain lower price points to capture high-volume, low-margin workloads 1.
Developer Sentiment and Adoption
Initial developer sentiment has been mixed, balancing the model's enhanced capabilities against its higher operational costs. For applications requiring complex tool use and high-accuracy coding suggestions in real-time, the model has seen adoption as a high-speed alternative to larger frontier models 1. According to data from model aggregators, the model has been integrated into various autonomous agent frameworks and multi-agent systems, including OpenClaw and Agent Zero, which prioritize the model's ability to handle long-context reasoning and structured JSON output 1.
In the context of the broader AI ecosystem, Claude 3.5 Haiku is frequently evaluated for its performance-to-latency ratio. While its predecessor was primarily valued for cost-efficiency, the 3.5 version is increasingly characterized as a "compact powerhouse" suited for tasks where intelligence requirements exceed the capabilities of standard small models but where the latency of a full-scale flagship model is unacceptable 1.
Impact on the Small-Model Ecosystem
The release of Claude 3.5 Haiku has influenced the market for low-latency large language models (LLMs) by challenging the assumption that the "Haiku" or entry-level tier must remain the cheapest option 9. By prioritizing increased reasoning and coding accuracy over price-matching competitors, Anthropic has attempted to create a mid-tier niche for the model 1. Media coverage has noted that this strategy reflects an industry-wide trend where developers must choose between "commodity" models optimized for cost and "specialized" small models optimized for technical reliability and instruction-following 1.
Version History
Claude 3.5 Haiku is the successor to the original Claude 3 Haiku model, which was introduced in March 2024 as the entry-level, high-speed tier of the Claude 3 model family 13. Anthropic officially announced the 3.5 iteration on October 22, 2024, as part of a broader release that included an upgraded version of Claude 3.5 Sonnet 1215. The initial build of the model is identified in the Anthropic API by the version string claude-3-5-haiku-20241022 1.
Following its initial announcement, the model reached general availability on November 4, 2024 115. On that date, it was integrated into Amazon Bedrock and the Anthropic API, with subsequent availability established on Google Cloud’s Vertex AI platform 115. This version history marks a significant shift in the model's economic positioning; while the original Claude 3 Haiku was noted for its ultra-low cost, the 3.5 version launched with a higher price point of $0.80 per million input tokens and $4.00 per million output tokens 115.
Compared to its predecessor, Claude 3.5 Haiku features an updated knowledge cutoff of July 2024 116. While the model maintains the 200,000-token context window standard to the Claude 3 family, the 3.5 iteration increased the maximum output limit to 8,192 tokens per request 1. Anthropic states that the 2024-10-22 version provides improvements in reasoning, coding accuracy, and instruction-following, claiming it matches the performance of the previous generation's largest model, Claude 3 Opus, on several industry benchmarks 1516. Unlike other models in the Claude 3.5 series, the initial release of Claude 3.5 Haiku does not support multimodal image processing, functioning exclusively as a text-to-text model 1.
Sources
- 1“Claude 3.5 Haiku - API Pricing & Providers”. OpenRouter. Retrieved April 1, 2026.
Released Nov 4, 2024 Knowledge cutoff Jul 31, 2024 200,000 context... Claude 3.5 Haiku features offers enhanced capabilities in speed, coding accuracy, and tool use. $0.80 per million input tokens, $4 per million output tokens. 200,000 token context window, maximum output of 8,192 tokens.
- 2“Claude Haiku 4.5”. Anthropic. Retrieved April 1, 2026.
Claude 3.5 Haiku Oct 22, 2024 For a similar speed to Haiku 3, Haiku 3.5 improved across every skill set and surpassed Opus 3, the largest model in our previous generation, on many intelligence benchmarks... It does not support image inputs.
- 4“Announcing three new capabilities for the Claude 3.5 model family in Amazon Bedrock | Amazon Web Services”. Amazon Web Services. Retrieved April 1, 2026.
November 4, 2024: Anthropic’s Claude 3.5 Haiku was announced as “coming soon” when this article originally published. It became available in Amazon Bedrock on Nov. 4. Claude 3.5 Haiku improves on its predecessor and matches the performance of Claude 3 Opus (previously Claude’s largest model).
- 6“Claude 3.5 Haiku: Faster, More Accurate Coding and Tool Use”. EssayDone. Retrieved April 1, 2026.
Coding Performance (on SWE-bench Verified): 40.60%. Context Window - 200,000 tokens. Maximum Output Length - 8,192 tokens.
- 7McMillan. (January 5, 2026). “Claude 3.5 Haiku: Foundations & Benchmarks”. EmergentMind. Retrieved April 1, 2026.
Claude 3.5 Haiku achieved an overall accuracy of 84%, with strongest performance in 'Basic Principles' (94%) ... but lagged in 'Image Weighting & Contrast' (~78%).
- 9“Claude Sonnet 3.5.1 and Haiku 3.5 — LessWrong”. LessWrong. Retrieved April 1, 2026.
Anthropic did note that this advance ‘brings with it safety challenges.’ They focused their attentions on present-day potential harms... underlying model, which remains ASL-2. ... The biggest concern in the near-term is the one they focus on: Prompt injection. ... We conducted an independent pre-deployment assessment... and will share our report soon.
- 10“Claude 3.5 Sonnet and the New Claude 3.5 Haiku”. Anthropic. Retrieved April 1, 2026.
Claude 3.5 Haiku is a major step forward for smaller models... delivering significant improvements in reasoning, coding accuracy, and tool use.
- 12“Pricing - Claude API Docs”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"Pricing","description":"Learn about Anthropic's pricing structure for models and features","url":"https://platform.claude.com/docs/en/about-claude/pricing","content":"# Pricing - Claude API Docs\n\nLoading...\n\n[](https://platform.claude.com/docs/en/home)\n* [Developer Guide](https://platform.claude.com/docs/en/intro)\n* [API Reference](https://platform.claude.com/docs/en/api/overview)\n* [MCP](https://modelcontextprotocol.io/)\n* [Resources](http
- 13“Haiku 3.5 released! : r/ClaudeAI - Reddit”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"warning":"Target URL returned error 403: Forbidden","title":"","description":"","url":"https://www.reddit.com/r/ClaudeAI/comments/1gjl6yl/haiku_35_released/","content":"You've been blocked by network security.\n\nTo continue, log in to your Reddit account or use your developer token\n\nIf you think you've been blocked by mistake, file a ticket below and we'll look into it.\n\n[Log in](https://www.reddit.com/login/)[File a ticket](https://support.reddithelp.com
- 15“Introducing Claude 3.5 Sonnet - Anthropic”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"Introducing Claude 3.5 Sonnet","description":"","url":"https://www.anthropic.com/news/claude-3-5-sonnet","content":"\n\n* Update\n\nConsumer Terms and Privacy Policy\n\nAug 28, 2025 \n\nToday, we’re launching Claude 3.5 Sonnet—our first rele
- 16“Anthropic's fastest model, Claude 3.5 Haiku, now generally available”. Retrieved April 1, 2026.
{"code":200,"status":20000,"data":{"title":"Anthropic's fastest model, Claude 3.5 Haiku, now generally available","description":"With features like image and file analysis, robust coding capabilities, and integration with Artifacts, Claude 3.5 Haiku is a powerful tool.","url":"https://venturebeat.com/ai/claude-3-5-haiku-chatbot-now-generally-available","content":"Anthropic has officially rolled out its Claude 3.5 Haiku model to all users through the Claude chatbot on the web and mobile apps, as

