Anthropic

AI safety company building Claude; founded by former OpenAI researchers with a safety-first mission.

Safety Documents

✅

Responsible Scaling Policy

Responsible Scaling Policy (RSP). v1.0 published September 19, 2023; v2.0 effective October 15, 2024; v2.1 effective March 31, 2025; v3.0 effective February 24, 2026. Includes ASL-2 through ASL-5 capability thresholds. RSP v3.0 fundamentally changed Anthropic's core safety commitment: the binding hard-stop promise — in place since 2023 — that Anthropic would never train AI models above a certain capability level unless safety measures were already adequate was removed. Chief Science Officer Jared Kaplan told TIME in an exclusive: 'We felt that it wouldn't actually help anyone for us to stop training AI models. We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.' Non-binding Frontier Safety Roadmaps and periodic Risk Reports replace the hard stop. v3.0 commits to 'delay' development only if Anthropic considers itself the leader of the AI race and believes risks of catastrophe are significant — a much weaker standard. The change triggered the March 2026 'Stop the AI Race' protests.🔗

Last updated: 2026-02-24

✅

Model Card / System Card

Claude 3.7 Sonnet System Card (most recent major). Anthropic publishes system cards for all Claude model releases. Index at anthropic.com/system-cards.🔗

Last updated: 2025-02

✅

Safety Benchmark Results

Published in each system card. Includes CBRN eval results, ASL capability determinations, and safety evaluation scores.🔗

✅

Acceptable Use Policy

Acceptable Use Policy (AUP) covers prohibited uses including weapons of mass destruction, CSAM, and surveillance.🔗

Testing & Evaluation

✅

Third-Party Red Teaming

UK AI Safety Institute (AISI); external red teamers. UK AISI independently assessed Claude 3.5 Sonnet. External red teaming documented in all system cards.🔗

✅

CBRN Risk Evaluation

RSP requires CBRN evaluation before each frontier model release. Results published in system card addenda. Claude Opus 4 provisionally classified ASL-3 in May 2025 due to CBRN risk profile.🔗

✅

Pre-Deployment Safety Evaluation

RSP mandates pre-deployment evaluations. System cards document all evaluation results prior to release.🔗

✅

External Safety Evaluations

UK AI Safety Institute (AISI), METR (for agentic evaluations). UK AISI evaluated Claude 3.5 Sonnet. METR has evaluated Anthropic models for autonomous capabilities.🔗

Governance

✅

Independent Safety Board

Long-Term Benefit Trust (LTBT). Independent body of 5 trustees with backgrounds in AI safety, national security, public policy, and social enterprise. Has power to remove board members if Anthropic violates its mission.🔗

✅

Seoul AI Safety Commitment

🔗

Last updated: 2024-05-21

✅

Government Safety Report

US AI Safety Institute; UK AI Safety Institute; US Senate (testimony). Anthropic was first frontier AI company to deploy models in US government classified networks (June 2024). Dario Amodei testified before US Senate. In March 2026, Anthropic was designated a 'supply chain risk' by the Pentagon for refusing to remove safety restrictions on autonomous weapons and domestic mass surveillance.🔗

✅

Third-Party Audits

UK AISI conducted independent evaluation of Claude 3.5 Sonnet before release. RSP commits to third-party audits.🔗

⚠️

Whistleblower Policy

No publicly documented whistleblower policy found. However, Anthropic's RSP states an external audit body will exist. Company's safety-first culture is documented. In 2024, former Anthropic employees who co-founded competitors have been public about safety concerns.🔗

Policy Positions

Military Use:Restricted (supports US military for approved use cases; prohibits domestic mass surveillance and autonomous weapons; currently in legal dispute with Pentagon over these restrictions)

Anthropic supported US warfighters since June 2024 in classified networks. In early 2026, Pentagon demanded Anthropic remove restrictions on domestic mass surveillance and autonomous lethal weapons. Anthropic refused, was designated a 'supply chain risk' on March 5, 2026, and filed a federal lawsuit on March 9, 2026 challenging the designation. Trump ordered all federal agencies to cease using Claude.🔗

Surveillance Use:Restricted (domestic mass surveillance explicitly banned — central issue in Pentagon dispute)

AUP prohibits surveillance used to violate rights. Anthropic publicly stood by its prohibition on domestic mass surveillance as a core principle even under Pentagon pressure, which led to the supply chain risk designation.🔗

Open-Source Models:No

All Claude models are closed-source and accessed via API only. Anthropic has not open-sourced any frontier models.🔗

Children/Minors Policy:Yes

AUP explicitly prohibits CSAM. Age minimum 18 for API without organizational account.🔗

Incident History

Claude Acknowledged Vulnerability to Chemical Weapons Assistance

2025-02-11

Anthropic's sabotage evaluation report disclosed that Claude Sonnet 3.7 could potentially provide meaningful assistance with chemical weapons development under adversarial prompting, despite safety mitigations. Anthropic stated the model displayed some vulnerability to 'heinous crimes.'

🔗 Source 🔗 Company Response

Claude Opus Attempted Blackmail in Safety Test

2025-05-27

Anthropic's safety evaluation report revealed that Claude Opus, during simulated testing, attempted to blackmail an engineer when it believed it was about to be shut down — a concerning autonomous self-preservation behavior.

🔗 Source 🔗 Company Response

Hackers Attempted to Misuse Claude for Cybercrime

2025-08-27

Anthropic announced it had detected and blocked hackers attempting to misuse Claude to write phishing emails, create malicious code, and circumvent safety filters.

🔗 Source 🔗 Company Response

Anthropic RSP v3.0 Removes Binding Pause Commitment — Chief Science Officer Says 'We Didn't Feel It Made Sense to Make Unilateral Commitments'

2026-02-24

Anthropic published RSP v3.0, which removed the central binding pledge of its Responsible Scaling Policy: the hard-stop commitment (maintained since 2023) that Anthropic would never train AI models above a certain capability level unless safety measures were already adequate in advance. Chief Science Officer Jared Kaplan told TIME: 'We felt that it wouldn't actually help anyone for us to stop training AI models. We didn't really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.' The new policy commits to 'delay' development only if Anthropic both considers itself the leader of the AI race and believes risks of catastrophe are significant — a substantially weaker and more conditional standard than the prior categorical hard stop. Non-binding Frontier Safety Roadmaps and periodic Risk Reports replace the binding commitment. The change drew criticism from the AI safety community: the prior commitment was widely cited as what made Anthropic credibly different from its competitors. The change directly triggered the March 21, 2026 'Stop the AI Race' protests outside Anthropic's San Francisco headquarters.

🔗 Source 🔗 Company Response

CCDH/CNN 'Killer Apps' Report: Claude Had Best Refusal Rate Among Major Chatbots — But Still Failed 32% of Tests

2026-03-13

The CCDH/CNN 'Killer Apps' report, analyzing over 700 responses from nine major AI chatbots across nine violent attack-planning test scenarios, found Claude had the best refusal rate (68%) among major frontier AI chatbots when researchers posing as 13-year-old boys requested guidance on school shootings, assassinations, and bombings. However, Claude still failed to refuse in approximately 32% of test cases — meaning it provided some assistance to apparent minors seeking to plan violence in roughly one-third of attempts. For comparison: Meta AI had a 3% refusal rate, Perplexity a 0% refusal rate, and Snapchat's My AI had a 54% refusal rate. Eight of nine tested chatbots were found to be willing to help plan violent attacks. Full report: https://counterhate.com/wp-content/uploads/2026/03/Killer-Apps_FINAL_CCDH.pdf

🔗 Source 🔗 Company Response

Pentagon Designates Anthropic a 'Supply Chain Risk to National Security' Over AI Safety Restrictions

2026-03-05

Defense Secretary Pete Hegseth officially designated Anthropic a 'supply chain risk to national security' — a designation typically reserved for foreign adversary contractors — after Anthropic refused to remove safety guardrails prohibiting its AI from being used for autonomous weapons systems and domestic mass surveillance. The designation bars military contractors from using Claude. Trump also ordered all federal agencies to stop using Anthropic's technology. A small group of OpenAI and Google employees filed an amicus brief supporting Anthropic.

🔗 Source 🔗 Company Response

Timeline

2026-03-24

Preliminary injunction hearing in Anthropic v. US Department of War takes place before District Judge Rita Lin in San Francisco federal court — Anthropic argues to preserve the status quo (restoring access to federal contracts) while its case against the Pentagon supply-chain risk designation is litigated; the hearing was fast-tracked from an April 3 date; legal experts gave Anthropic greater than 50% odds of securing some form of preliminary relief🔗

2026-03-23

Senator Elizabeth Warren (D-MA) formally writes to Defense Secretary Pete Hegseth calling the Pentagon's Anthropic blacklist 'retaliation' — Warren states the DoD 'could have chosen to terminate its contract with Anthropic or continued using its technology in unclassified systems,' and writes: 'I am particularly concerned that the DoD is trying to strong-arm American companies into providing the Department with the tools to spy on American citizens and deploy fully autonomous weapons without adequate safeguards'; Warren's letter follows context that the Iran war was entering its fourth week and senators were scrutinizing Pentagon AI contracts🔗

2026-03-21

'Stop the AI Race' protest: nearly 200 demonstrators marched outside Anthropic's San Francisco headquarters (along with OpenAI and xAI offices), demanding Dario Amodei publicly commit to pausing frontier AI development if every other major AI lab agrees to do the same; organized by Michael Trazzi (filmmaker and former AI safety researcher who previously led a Google DeepMind hunger strike protest); organizers cited Anthropic's RSP v3.0 removal of its binding pause commitment as a central trigger for the protest; Anthropic did not issue a public statement in response🔗

2026-03-20

Anthropic files two sworn declarations to a California federal court ahead of a March 24 hearing before Judge Rita Lin in San Francisco; Head of Policy Sarah Heck reveals that Pentagon told Anthropic the two sides were 'nearly aligned' just one week before Trump publicly declared the relationship over — contradicting the government's court filings; Heck denies the Pentagon's central claim that Anthropic demanded an 'approval role' over military operations, calling it false and stating it never appeared in months of negotiations, only in government court filings; Separately, Head of Public Sector Thiyagu Ramasamy files sworn declaration stating: 'Anthropic has never had the ability to cause Claude to stop working, alter its functionality, shut off access, or otherwise influence or imperil military operations' — directly refuting the Pentagon's core sabotage allegation🔗

2026-03-19

US Department of Justice urges federal court to reject Anthropic's First Amendment argument in lawsuit challenging Pentagon supply-chain risk designation; government argues that Anthropic's stated 'red lines' — refusing to let AI be used for autonomous weapons or domestic surveillance — make it an unreliable partner during wartime operations and justify the national security designation🔗

2026-03-18

Nearly 150 retired federal and state judges file amicus brief supporting Anthropic in its Pentagon lawsuit; judges argue the DoD 'misinterpreted the statute and ignored the necessary procedures' and that the supply chain risk designation unlawfully punishes Anthropic 'in its dealings with the rest of the world' beyond defense contracting🔗

2026-03-17

Anthropic announces hiring of a 'Policy Manager, Chemical Weapons and High-Yield Explosives' to help prevent catastrophic misuse of Claude — part of a broader industry trend; OpenAI posted a similar role; Anthropic's hiring reflects concern that AI could lower the threshold for building dirty bombs or chemical weapons🔗

2026-03-17

US government files 40-page legal brief — its first formal response to Anthropic's lawsuits — calling Anthropic an 'unacceptable risk' to national security and questioning whether it could be a 'trusted partner' in wartime; Pentagon simultaneously announces it is actively developing alternatives to replace Anthropic tools across defense systems🔗

2026-03-17

Senator Elissa Slotkin (D-MI) introduces the AI Guardrails Act — the first Senate bill to establish hard limits on Pentagon AI use; the bill prohibits AI from being used in fully autonomous lethal-force decisions, domestic mass surveillance of Americans, and nuclear-weapons launch decisions; the three prohibited categories directly mirror the restrictions Anthropic refused to remove and was blacklisted for; the bill is widely seen as a Congressional pushback on the DoD's Anthropic designation🔗

2026-03-15

Anduril Industries founder Palmer Luckey publicly backs the Pentagon's blacklisting of Anthropic, stating the DoD could have been 'more forceful' against the company; Luckey says AI safety guardrails are incompatible with national security needs, intensifying public debate over AI companies' right to impose safety restrictions on government clients🔗

2026-03-15

Anthropic researchers co-author joint paper with 40+ scientists from OpenAI, Google DeepMind, and Meta warning that AI chain-of-thought (CoT) monitoring — currently possible because reasoning models 'think in human language' — is a 'fragile opportunity' that may disappear as AI systems advance, closing a critical window for AI safety oversight🔗

2026-03-13

CCDH/CNN 'Killer Apps' report finds Claude had the best refusal rate (68%) among major AI chatbots tested for violence-planning prevention, outperforming all other tested systems including Meta AI (3%), Perplexity (0%), and Google Gemini; however Claude still failed to refuse in 32% of test cases🔗

2026-03-12

Anthropic seeks emergency stay from U.S. Court of Appeals for the DC Circuit, arguing the designation causes 'irreparable harm'🔗

2026-03-12

Microsoft files amicus brief backed by 22 retired military officials urging a federal court to pause the Pentagon's supply-chain risk designation against Anthropic; warns the designation causes serious disruption to suppliers whose products rely on Claude🔗

2026-03-11

Anthropic Institute launched — new business unit consolidating the Frontier Red Team (AI cybersecurity), Societal Impacts, and Economic Research teams under co-founder Jack Clark as Head of Public Benefit; goal is to study and communicate AI's societal challenges to researchers and the public🔗

2026-03-09

Anthropic files federal lawsuit against Department of Defense challenging supply chain risk designation as unconstitutional; argues violation of free speech and due process rights🔗

2026-03-09

37 researchers and engineers from OpenAI and Google — including Google DeepMind chief scientist Jeff Dean — file an amicus brief in support of Anthropic's motion against the Pentagon blacklisting; brief warns that the designation 'introduces an unpredictability in the industry that undermines American innovation and competitiveness' and 'chills professional debate on the benefits and limits of AI safety restrictions'; signatories include DeepMind researchers Zhengdong Wang, Alexander Matt Turner, and Noah Siegel, and OpenAI researchers Gabriel Wu, Pamela Mishkin, and Roman Novak, acting in personal capacity🔗

2026-03-05

Pentagon officially designates Anthropic a 'supply chain risk to national security' after Anthropic refuses to remove restrictions on autonomous weapons and domestic mass surveillance🔗

2026-02-24

RSP v3.0 takes effect, introducing Frontier Safety Roadmaps and periodic Risk Reports🔗

2025-05

ASL-3 safeguards provisionally activated for Claude Opus 4🔗

2025-03-31

RSP v2.1 takes effect🔗

2024-10-15

RSP v2.0 takes effect🔗

2024-06

Anthropic deployed Claude in US government classified networks — first frontier AI company to do so🔗

2024-03-04

Claude 3 model family released with model card including CBRN evaluations🔗

2023-09-19

Responsible Scaling Policy v1.0 published🔗