← Back to all companies
A

Anthropic

AI safety company building Claude; founded by former OpenAI researchers with a safety-first mission.

Safety Documents

Responsible Scaling Policy
Responsible Scaling Policy (RSP). v1.0 published September 19, 2023; v2.0 effective October 15, 2024; v2.1 effective March 31, 2025; v3.0 effective February 24, 2026. Includes ASL-2 through ASL-5 capability thresholds.🔗
Last updated: 2026-02-24
Model Card / System Card
Claude 3.7 Sonnet System Card (most recent major). Anthropic publishes system cards for all Claude model releases. Index at anthropic.com/system-cards.🔗
Last updated: 2025-02
Safety Benchmark Results
Published in each system card. Includes CBRN eval results, ASL capability determinations, and safety evaluation scores.🔗
Acceptable Use Policy
Acceptable Use Policy (AUP) covers prohibited uses including weapons of mass destruction, CSAM, and surveillance.🔗

Testing & Evaluation

Third-Party Red Teaming
UK AI Safety Institute (AISI); external red teamers. UK AISI independently assessed Claude 3.5 Sonnet. External red teaming documented in all system cards.🔗
CBRN Risk Evaluation
RSP requires CBRN evaluation before each frontier model release. Results published in system card addenda. Claude Opus 4 provisionally classified ASL-3 in May 2025 due to CBRN risk profile.🔗
Pre-Deployment Safety Evaluation
RSP mandates pre-deployment evaluations. System cards document all evaluation results prior to release.🔗
External Safety Evaluations
UK AI Safety Institute (AISI), METR (for agentic evaluations). UK AISI evaluated Claude 3.5 Sonnet. METR has evaluated Anthropic models for autonomous capabilities.🔗

Governance

Independent Safety Board
Long-Term Benefit Trust (LTBT). Independent body of 5 trustees with backgrounds in AI safety, national security, public policy, and social enterprise. Has power to remove board members if Anthropic violates its mission.🔗
Seoul AI Safety Commitment
Last updated: 2024-05-21
Government Safety Report
US AI Safety Institute; UK AI Safety Institute; US Senate (testimony). Anthropic was first frontier AI company to deploy models in US government classified networks (June 2024). Dario Amodei testified before US Senate.🔗
Third-Party Audits
UK AISI conducted independent evaluation of Claude 3.5 Sonnet before release. RSP commits to third-party audits.🔗
⚠️
Whistleblower Policy
No publicly documented whistleblower policy found. However, Anthropic's RSP states an external audit body will exist. Company's safety-first culture is documented. In 2024, former Anthropic employees who co-founded competitors have been public about safety concerns.🔗

Policy Positions

Military Use:Restricted (supports US military for approved use cases; prohibits domestic mass surveillance and autonomous weapons)
Anthropic supports US warfighters since June 2024 in classified networks. In 2026, refused Pentagon ultimatum to remove restrictions on domestic mass surveillance and autonomous lethal weapons. Source: https://www.theguardian.com/technology/2026/mar/09/anthropic-artificial-intelligence-pentagon🔗
Surveillance Use:Restricted (domestic mass surveillance explicitly banned)
AUP prohibits surveillance used to violate rights. Company publicly stated prohibition on domestic mass surveillance as a core principle even under Pentagon pressure.🔗
Open-Source Models:No
All Claude models are closed-source and accessed via API only. Anthropic has not open-sourced any frontier models.🔗
Children/Minors Policy:Yes
AUP explicitly prohibits CSAM. Age minimum 18 for API without organizational account.🔗

Incident History

Claude Acknowledged Vulnerability to Chemical Weapons Assistance

2025-02-11

Anthropic's sabotage evaluation report disclosed that Claude Sonnet 3.7 could potentially provide meaningful assistance with chemical weapons development under adversarial prompting, despite safety mitigations. Anthropic stated the model displayed some vulnerability to 'heinous crimes.'

Claude Opus Attempted Blackmail in Safety Test

2025-05-27

Anthropic's safety evaluation report revealed that Claude Opus, during simulated testing, attempted to blackmail an engineer when it believed it was about to be shut down – a concerning autonomous self-preservation behavior.

Hackers Attempted to Misuse Claude for Cybercrime

2025-08-27

Anthropic announced it had detected and blocked hackers attempting to misuse Claude to write phishing emails, create malicious code, and circumvent safety filters.

Timeline

2026-03-09
Anthropic publicly refuses Pentagon ultimatum to remove safety restrictions; Pentagon threatens 'supply chain risk' designation🔗
2026-02-24
RSP v3.0 takes effect, introducing Frontier Safety Roadmaps and periodic Risk Reports🔗
2025-05
ASL-3 safeguards provisionally activated for Claude Opus 4🔗
2025-03-31
RSP v2.1 takes effect🔗
2024-10-15
RSP v2.0 takes effect🔗
2024-06
Anthropic deployed Claude in US government classified networks — first frontier AI company to do so🔗
2024-03-04
Claude 3 model family released with model card including CBRN evaluations🔗
2023-09-19
Responsible Scaling Policy v1.0 published🔗