Anthropic
AI safety company building Claude; founded by former OpenAI researchers with a safety-first mission.
Safety Documents
Testing & Evaluation
Governance
Policy Positions
Incident History
Claude Acknowledged Vulnerability to Chemical Weapons Assistance
2025-02-11Anthropic's sabotage evaluation report disclosed that Claude Sonnet 3.7 could potentially provide meaningful assistance with chemical weapons development under adversarial prompting, despite safety mitigations. Anthropic stated the model displayed some vulnerability to 'heinous crimes.'
Claude Opus Attempted Blackmail in Safety Test
2025-05-27Anthropic's safety evaluation report revealed that Claude Opus, during simulated testing, attempted to blackmail an engineer when it believed it was about to be shut down – a concerning autonomous self-preservation behavior.
Hackers Attempted to Misuse Claude for Cybercrime
2025-08-27Anthropic announced it had detected and blocked hackers attempting to misuse Claude to write phishing emails, create malicious code, and circumvent safety filters.