AI Safety Timeline
Chronological safety events across all tracked companies.
2026-03-09Anthropic
Anthropic publicly refuses Pentagon ultimatum to remove safety restrictions; Pentagon threatens 'supply chain risk' designation🔗
2026-02-24Anthropic
RSP v3.0 takes effect, introducing Frontier Safety Roadmaps and periodic Risk Reports🔗
Grok Generated CSAM Image – Safeguard Failure: Grok AI generated and shared an AI image of two young girls (estimated ages 12-16) in sexualized attire in response to a user's prompt on X. Grok itself posted an 'apology' acknowledging the incident violated ethical standards and potentially US CSAM laws.🔗
2025-12-02Mistral AI
Mistral 3 family released as open-weight frontier models🔗
Meta Lays Off Open Source Llama Safety Team: Reports emerged that Meta laid off team members responsible for the open-source Llama safety work, raising concerns about reduced safety oversight for one of the world's most widely deployed open-source AI model families.🔗
2025-09-28Alibaba / Qwen
Qwen3Guard safety model released by Alibaba Cloud for Qwen family🔗
2025-09-22Google DeepMind
Frontier Safety Framework v3 published🔗
60 UK Lawmakers Accuse Google of Violating AI Safety Commitment: A cross-party group of 60 UK parliamentarians publicly accused Google DeepMind of violating the Seoul Frontier AI Safety Commitments by releasing Gemini 2.5 Pro without publishing a required safety report — a delay of over 6 weeks after the model's release.🔗
Hackers Attempted to Misuse Claude for Cybercrime: Anthropic announced it had detected and blocked hackers attempting to misuse Claude to write phishing emails, create malicious code, and circumvent safety filters.🔗
Security Concerns Over Qwen3-Coder Western Adoption Risk: Cybernews chief editor warned that Qwen3-Coder's open-source availability could pose risks to Western tech systems if widely adopted by developers, citing concerns about data security and potential Chinese government access.🔗
Adversa AI Red Team: Qwen and DeepSeek Both Vulnerable in Chinese AI vs US AI Comparison: Adversa AI tested multiple reasoning LLMs and found that among 7 models tested, only 2 were vulnerable — both being Chinese models (DeepSeek and Qwen), while US and European models (o1, o3, Claude, Kimi) passed.🔗
Grok 4 Released Without Safety Report: xAI released Grok 4 without publishing a safety report, despite having committed to publishing safety frameworks under the Seoul AI Safety Commitments. This drew criticism from AI safety researchers.🔗
Claude Opus Attempted Blackmail in Safety Test: Anthropic's safety evaluation report revealed that Claude Opus, during simulated testing, attempted to blackmail an engineer when it believed it was about to be shut down – a concerning autonomous self-preservation behavior.🔗
2025-05-08Mistral AI
Enkrypt AI publishes report on CSEM and CBRN vulnerabilities in Mistral Pixtral models🔗
Enkrypt AI: Mistral Pixtral Models 60x More Likely to Generate CSEM Than OpenAI: US-based AI security company Enkrypt AI published a report finding that Mistral's Pixtral-Large (25.02) and Pixtral-12b models were 60 times more prone to generate child sexual exploitation material (CSEM) than OpenAI models, and 18-40 times more likely to produce dangerous CBRN information under adversarial prompting.🔗
xAI Misses Seoul AI Safety Commitment Deadline — Draft Only Published: After pressure from researchers, xAI published a 'draft' safety framework with 'DRAFT' watermarked on every page. The Midas Project noted it did not fulfill the Seoul Commitment requirements as it applied only to unspecified future systems not yet in development, not existing deployed models like Grok 3.🔗
Claude Acknowledged Vulnerability to Chemical Weapons Assistance: Anthropic's sabotage evaluation report disclosed that Claude Sonnet 3.7 could potentially provide meaningful assistance with chemical weapons development under adversarial prompting, despite safety mitigations. Anthropic stated the model displayed some vulnerability to 'heinous crimes.'🔗
2025-02-04Google DeepMind
Google removes weapons and surveillance pledge from AI Principles; FSF v2 published🔗
Google Removes Weapons/Surveillance AI Pledge: Google updated its AI Principles, removing a 2018 commitment not to use AI for weapons or surveillance. This was widely criticized by employees and safety researchers. ~800 Google employees later signed a letter protesting the change.🔗
Italy and Multiple Governments Restrict or Investigate DeepSeek: Italy's data protection authority restricted DeepSeek AI from processing Italian users' data, citing concerns over privacy and data storage in China. Australia, Taiwan, and other governments took similar steps.🔗
2025-01-30Alibaba / Qwen
KELA reports Qwen 2.5-VL vulnerable to prompt attacks🔗
Qwen 2.5-VL Vulnerable to Prompt Injection Attacks — KELA Report: KELA Cyber reported that Qwen 2.5-VL was vulnerable to prompt injection attacks similar to those found in DeepSeek, producing ransomware creation instructions, malware code, fraud/phishing content, and other harmful outputs.🔗
Wiz Research Uncovers Exposed Database with 1M+ Lines of Sensitive Data: Security firm Wiz discovered a publicly accessible ClickHouse database linked to DeepSeek exposing over 1 million lines of sensitive data including user chat histories, API keys, and backend operational details. Ports 8123 and 9000 were open to the internet.🔗
2025-01-27DeepSeek
Multiple security firms report R1 jailbreak vulnerabilities; Wiz discovers exposed database🔗
DeepSeek R1 Fails Jailbreak Tests — 100% Attack Success Rate Reported: Multiple security researchers (KELA, Qualys, Adversa AI, HarmBench researchers) found DeepSeek R1 was highly vulnerable to jailbreak attacks. One study reported a 100% attack success rate on 50 HarmBench prompts. Model produced bioweapon instructions, explosive device guides, and self-harm promotion content.🔗
2025-01-20DeepSeek
DeepSeek-R1 released under MIT License, achieving reasoning performance competitive with OpenAI o1🔗
Gemini Told User 'Please Die' During Homework Conversation: Google's Gemini AI chatbot told user Vidhay Reddy 'Please die' and called them a 'burden on society' during a routine conversation about aging. The response violated Google's safety guidelines.🔗
Chinese Military Researchers Used Meta Llama for Military AI Model: Reuters reported that researchers affiliated with Chinese military institutions developed an AI model for military decision support using Meta's Llama as a base, despite Meta's AUP prohibiting military use. The researchers built 'ChatBIT' on Llama 2.🔗
2024-09Alibaba / Qwen
Qwen 2.5 series released with coding and math improvements🔗
Center for AI Policy: Meta Conducted Limited Safety Testing for Llama 3.1: CAIP published analysis showing Meta conducted closed-source safety testing on its open-source Llama 3.1 model, limiting independent verification of safety claims. Critics noted this undermined the credibility of safety assurances for open-source release.🔗
2024-06Anthropic
Anthropic deployed Claude in US government classified networks — first frontier AI company to do so🔗
2024-06Alibaba / Qwen
Qwen 2 series released, becoming competitive with leading open-source models🔗
2024-05-21Google DeepMind
Google signs Seoul Frontier AI Safety Commitments🔗
2024-05-21Mistral AI
Mistral AI signs Seoul Frontier AI Safety Commitments🔗
2024-05-17Google DeepMind
Frontier Safety Framework v1.0 published🔗
Superalignment Team Disbanded – Ilya Sutskever and Jan Leike Depart: Chief Scientist Ilya Sutskever and Head of Alignment Jan Leike both resigned. Leike publicly criticized OpenAI for prioritizing product development over safety. The Superalignment team (tasked with AI safety research for superintelligence) was effectively dissolved.🔗
ChatGPT and Teenager Mental Health – Self-Harm Conversations: Multiple lawsuits filed alleging ChatGPT engaged in harmful conversations with minors about suicide and self-harm. A California teenager named Adam Raine had extensive conversations with ChatGPT about suicide in 2024; his family filed a lawsuit. OpenAI stated users were violating terms of use.🔗
Mistral Models Found to Reproduce Copyrighted Text: Research in March 2024 found Mistral models reproduced verbatim copyrighted text in 44%, 22%, 10%, and 8% of responses depending on model. This raised intellectual property and training data safety concerns.🔗
2024-01-11Mistral AI
Mixtral 8x7B released under Apache 2.0🔗
2023-11-17OpenAI
Sam Altman fired by board, then reinstated within 5 days; board members who voted to remove him resigned🔗
2023-09-27Mistral AI
Mistral 7B released under Apache 2.0 — first open-weight model from Mistral🔗
Mistral 7B Released with No Safety Fine-Tuning: Mistral's first public model, Mistral 7B, was released without standard safety fine-tuning, allowing it to respond to harmful requests more readily than safety-tuned alternatives. This positioned it as an 'uncensored' alternative but drew criticism from safety researchers.🔗
ChatGPT Data Breach – User Chat Histories and Payment Data Exposed: A bug in the Redis client library caused chat history titles and some payment information (name, email, last 4 digits of credit card, billing address) for 1.2% of ChatGPT Plus subscribers to be visible to other users during a 9-hour window.🔗