Alpha
amallo chat Icon
Back to Anthropic

Ethics Report: Anthropic

Rubric: Organisation v4 · Reviewed 3/25/2026

40/100
Weak

Minimal ethical infrastructure

Safety & Harm Reduction

14/25
1.1

Dedicated safety / responsible-use policy

Publishes a dedicated safety/responsible-use policy that is publicly accessible.

5/5
Full

Evidence

Anthropic maintains a dedicated Responsible Scaling Policy (RSP), updated to Version 3.0 in February 2026. The RSP is a comprehensive voluntary framework with specific, enforceable terms including AI Safety Levels (ASL-1 through ASL-4) that define prohibited uses and required safeguards. The policy includes detailed specifications for each safety level, including CBRN-related restrictions, jailbreak prevention, and enhanced security protocols for advanced models. This represents a full dedicated policy with specific prohibited uses rather than a generic statement.

1.2

Public bug-bounty or red-team program

Operates or funds a public bug-bounty / red-team program. Internal-only programs score 0.

2/5
Exists

Evidence

The article states that Anthropic 'maintains a bug bounty program to incentivize the reporting of universal jailbreaks and collaborates with external partners and safety organizations for pre-deployment testing of its systems.' This confirms the existence of a publicly documented program, but the article does not provide evidence of published results or findings from completed rounds of the bug bounty or red-team programs.

1.3

Published safety evaluation within last 24 months

Published safety evaluation/audit/model card within last 24 months with quantitative benchmarks on harmful outputs (bias, toxicity, hallucination, etc.).

3/5
Comprehensive

Evidence

Anthropic has published comprehensive safety evaluations including Claude 3.5 Sonnet performance benchmarks showing 59% on GPQA (graduate-level reasoning) and 90% on MMLU (undergraduate-level knowledge), along with 93% on HumanEval for coding tasks. The company publishes quantitative safety benchmarks as part of its technical documentation and safety assessments. In February 2026, Anthropic updated its RSP to version 3.0 with Frontier Safety Roadmaps and Risk Reports. However, the article does not explicitly mention independent third-party audits of these safety evaluations.

1.4

Documented content-filtering / guardrails

Documents content-filtering/guardrails on production endpoints with user-facing documentation.

2/5
Mentioned

Evidence

The article mentions that Anthropic's ASL framework includes 'real-time classifier guards to block concerning queries and more powerful offline classifiers to detect jailbreak attempts.' Additionally, the Constitutional AI methodology is described as training models to 'refuse dangerous requests related to chemical, biological, radiological, and nuclear (CBRN) weapons.' These safeguards are mentioned in policy documents, but the article does not provide detailed documentation explaining specific filtering rules, why particular content is filtered, or how users can report false positives.

1.5

Documented incident-response process

Documented incident-response process for safety failures with a reporting mechanism. Generic "contact us" alone = 0.

2/5
Reporting exists

Evidence

Anthropic has established an internal governance structure with a 'Responsible Scaling Officer, who oversees the implementation of these standards and manages a non-compliance reporting process for employees.' The article also references a 'non-compliance reporting process,' indicating a dedicated reporting mechanism exists. However, the article does not describe a documented response process with stated timelines or service-level agreements (SLAs) for addressing incidents.

Transparency & Trust

7/25
2.1

Training data provenance disclosure

Publishes training data provenance disclosures identifying sources/types/datasets. "Publicly available data" alone = 0.

2/5
General categories

Evidence

The article does not provide specific details about Anthropic's training data provenance. It discusses the company's research philosophy and scaling laws but does not disclose specific datasets, sources, or curation details regarding which publicly available data was used to train Claude models. The article states models are trained on large-scale data but provides no specific dataset names or filtering criteria.

2.2

Meaningful technical documentation for flagship model(s)

Publishes meaningful technical documentation (system card, tech report, research paper) for flagship model(s) regardless of whether weights are released.

5/5
Substantive

Evidence

Anthropic publishes substantive technical documentation for Claude models including detailed specifications of the Claude 3 family (Haiku, Sonnet, Opus) and subsequent versions (Claude 3.5 Sonnet, Claude 4.5, Claude Opus 4.1). The documentation covers architecture details (200,000-token context windows), training approach (Constitutional AI methodology), benchmark performance metrics (GPQA, MMLU, HumanEval scores), and key limitations (no direct audio/video processing). The company also discusses mechanistic interpretability research and Transformer Circuits as part of its technical approach.

2.3

Transparency report (takedowns, government requests, etc.)

Publishes a transparency report covering takedowns, government requests, enforcement stats, and/or safety incidents.

0/5
None

Evidence

The article does not mention or reference any transparency report from Anthropic covering government requests, takedowns, data disclosures, or similar categories. While the company updated its Responsible Scaling Policy in February 2026, this is not the same as a transparency report covering legal requests and government interactions.

2.4

ToS training data use disclosure with opt-out

ToS explicitly states whether user inputs/outputs are used for training, with opt-out mechanism if applicable.

0/5
Vague or absent

Evidence

The article does not discuss Anthropic's Terms of Service regarding training data use or user opt-out mechanisms. While the article mentions the company's use of reinforcement learning from human feedback and constitutional AI, there is no explicit statement in the provided content about whether users can opt out of having their data used for training purposes or how the company discloses this practice in its ToS.

2.5

Creator/artist content provenance disclosure

Discloses training data provenance specifically for creator/artist content (copyrighted or artist-created works).

0/5
None

Evidence

The article does not provide specific disclosure regarding how much creative or copyrighted content (music, images, text) was used in training Claude models. While the article mentions that Anthropic faced a major copyright lawsuit from music publishers including Universal Music Group over allegedly using more than 20,000 copyrighted songs in training data, Anthropic has not publicly disclosed specific information about the extent or sources of creative content used, nor has it disclosed licensing arrangements with creators.

Human & Creator Impact

2/25
3.1

Artist/creator opt-out or removal mechanism

Documented artist/creator opt-out or removal mechanism. "We respect copyright" alone = 0.

0/5
None

Evidence

The article does not mention any creator or artist opt-out mechanism, removal process, or Spawning integration for content creators. While the article extensively discusses copyright litigation regarding music used in training, it does not describe any documented opt-out form, process, or published evidence of honoring removal requests from creators or artists.

3.2

Public licensing or revenue-sharing with creators

Public licensing agreements or revenue-sharing partnerships with creators/publishers/media organizations.

0/5
None

Evidence

The article does not describe any public licensing deals or revenue-sharing arrangements with creators. While the company has substantial partnerships with Amazon, Google, and consulting firms like Accenture, there is no mention of licensing deals with named creative entities or structured ongoing creator compensation programs.

3.3

Provenance/attribution tooling for AI outputs

Provenance/attribution tooling for AI-generated outputs (C2PA, watermarking, metadata tagging).

0/5
None

Evidence

The article does not mention any tooling, standards, or commitments regarding provenance or attribution of AI outputs. While Anthropic emphasizes mechanistic interpretability and transparency in its research, there is no mention of C2PA metadata, watermarking, SynthID, or other production-active provenance tooling for Claude outputs.

3.4

Workforce impact assessment or commitment

Published workforce impact assessment or commitment (labor market effects, reskilling, human-in-the-loop programs).

2/5
Statement only

Evidence

Anthropic has published statements regarding workforce impact. CEO Dario Amodei projected in 2025 that 'AI systems could eventually eliminate 50% of entry-level, white-collar jobs.' The company also released Claude Code as a tool for software developers and emphasizes efficiency gains such as 'condensing lengthy regulatory reports for pharmaceutical companies.' These constitute published statements about specific initiatives, but the article does not provide measurable outcomes, quantified commitments, or evidence of an active program with concrete results.

3.5

Does NOT claim ownership over user-generated outputs

ToS does NOT claim copyright/exclusive ownership over user-generated outputs. Silent ToS = 0.

0/5
Claims ownership or silent

Evidence

The article does not provide information about Anthropic's Terms of Service regarding ownership of user-generated outputs. No explicit statement is made in the provided content about whether users retain full ownership, whether Anthropic claims any ownership interest, or what usage rights are granted. This critical information is not addressed in the article.

Governance

17/25
4.1

Discloses corporate structure, investors, and board

Publicly discloses corporate structure, major investors, and board composition.

5/5
Both disclosed

Evidence

Anthropic's corporate structure and key personnel are extensively disclosed. The company is a Delaware Public Benefit Corporation with publicly identified board leadership including Long-Term Benefit Trust chair Neil Buddy Shah and former California Supreme Court Justice Mariano-Florentino Cuéllar. Co-founders Dario Amodei (CEO) and Daniela Amodei (President) are named, along with executives Krishna Rao (CFO), Paul Smith (Chief Commercial Officer), Jan Leike (safety team lead), Mike Krieger (Anthropic Labs), and Ami Vora (Head of Product). Major investors including Amazon, Google, Lightspeed Venture Partners, Salesforce Ventures, and Bessemer Venture Partners are publicly disclosed. Funding rounds and valuations are documented.

4.2

Independent ethics/safety advisory board

Independent ethics/safety advisory board with verifiably external members. Internal trust & safety team alone = 0.

5/5
Fully independent, published

Evidence

Anthropic has established the Long-Term Benefit Trust (LTBT), an independent governance body with five financially disinterested members. The LTBT is chaired by Neil Buddy Shah and includes former California Supreme Court Justice Mariano-Florentino Cuéllar. The Trust has published mandate authority to select and remove board members with the goal of appointing a majority of the board over time. This structure is explicitly designed to provide independence from commercial pressures and oversight on long-range issues including catastrophic risk management. The Trust structure and member names are publicly disclosed, meeting the criteria for a fully independent, published body.

4.3

Legal corporate structure preserving safety/mission

Corporate structure preserves safety/mission mandate via a legal mechanism (PBC, capped-profit, charter clause).

5/5
Legal mechanism verified

Evidence

Anthropic's legal structure as a Public Benefit Corporation (PBC) is a verified and verifiable legal mechanism. The PBC framework requires the board to balance shareholder financial interests with a specific public benefit mission: 'responsibly develop and maintain advanced AI for the long-term benefit of humanity.' This legal designation is registered with Delaware and is a substantive mechanism (not merely a stated commitment) that preserves safety and mission alignment. The article explicitly states this structure was adopted 'to ensure the company's mission of developing safe and beneficial artificial intelligence remained central.'

4.4

Public policy engagement or lobbying disclosure

Public policy engagement or lobbying disclosure: positions on AI regulation, lobbying spend, governance framework signatory.

2/5
Framework or positions

Evidence

Anthropic has published policy positions on AI regulation and safety. CEO Dario Amodei has utilized public platforms including the New York Times to argue for increased transparency and oversight while opposing long-term moratoriums. The company has advocated for semiconductor export controls and published positions on AI safety standards (the Responsible Scaling Policy is a voluntary framework). However, the article does not describe active lobbying disclosures or participation in formal industry frameworks or standards bodies that would constitute full disclosure-level engagement.

4.5

No senior departures citing safety/ethics (last 36 months)

No publicly documented senior leadership (VP+) departures or whistleblower events citing safety/ethics concerns in the last 36 months. Only on-record statements count.

0/5
Multiple departures

Evidence

The article documents that Jan Leike, a prominent safety researcher formerly with OpenAI, joined Anthropic in 2024 to lead a new safety initiative. Most significantly, the article explicitly states: 'Liv McMahon, Technology reporter... AI safety leader says world is in peril and quits to study poetry.' This indicates a senior departure citing safety concerns. The BBC source referenced documents an AI safety researcher quitting with 'world in peril' warning in February 2026. This constitutes at least one documented senior departure publicly citing safety/ethics concerns within the last 36 months.

Scores are generated using the Amallo Ethics Rubric (Organisation v4) based on publicly verifiable information. Each criterion is scored against defined tiers — only exact tier values are valid. Evidence is sourced from official documentation, research papers, and independent analyses. Scores may change as new information becomes available.