AI Safety Timeline

Chronological safety events across all tracked companies.

2026-03-09Anthropic
Anthropic publicly refuses Pentagon ultimatum to remove safety restrictions; Pentagon threatens 'supply chain risk' designation🔗
2026-02-24Anthropic
RSP v3.0 takes effect, introducing Frontier Safety Roadmaps and periodic Risk Reports🔗
2026-02-23xAI
xAI and Pentagon reach deal to use Grok in classified defense systems🔗
2025-12-31xAI
Frontier Artificial Intelligence Framework (FAIF) published🔗
2025-12-28xAI
Grok generates CSAM image — safeguard failure incident🔗
2025-12-28xAIincident
Grok Generated CSAM Image – Safeguard Failure: Grok AI generated and shared an AI image of two young girls (estimated ages 12-16) in sexualized attire in response to a user's prompt on X. Grok itself posted an 'apology' acknowledging the incident violated ethical standards and potentially US CSAM laws.🔗
2025-12-02Mistral AI
Mistral 3 family released as open-weight frontier models🔗
2025-10Metaincident
Meta Lays Off Open Source Llama Safety Team: Reports emerged that Meta laid off team members responsible for the open-source Llama safety work, raising concerns about reduced safety oversight for one of the world's most widely deployed open-source AI model families.🔗
2025-09-28Alibaba / Qwen
Qwen3Guard safety model released by Alibaba Cloud for Qwen family🔗
2025-09-22Google DeepMind
Frontier Safety Framework v3 published🔗
2025-08-29Google DeepMindincident
60 UK Lawmakers Accuse Google of Violating AI Safety Commitment: A cross-party group of 60 UK parliamentarians publicly accused Google DeepMind of violating the Seoul Frontier AI Safety Commitments by releasing Gemini 2.5 Pro without publishing a required safety report — a delay of over 6 weeks after the model's release.🔗
2025-08-27Anthropicincident
Hackers Attempted to Misuse Claude for Cybercrime: Anthropic announced it had detected and blocked hackers attempting to misuse Claude to write phishing emails, create malicious code, and circumvent safety filters.🔗
2025-08-20xAI
Updated Risk Management Framework published🔗
2025-08-15Alibaba / Qwenincident
Security Concerns Over Qwen3-Coder Western Adoption Risk: Cybernews chief editor warned that Qwen3-Coder's open-source availability could pose risks to Western tech systems if widely adopted by developers, citing concerns about data security and potential Chinese government access.🔗
2025-07-21Alibaba / Qwenincident
Adversa AI Red Team: Qwen and DeepSeek Both Vulnerable in Chinese AI vs US AI Comparison: Adversa AI tested multiple reasoning LLMs and found that among 7 models tested, only 2 were vulnerable — both being Chinese models (DeepSeek and Qwen), while US and European models (o1, o3, Claude, Kimi) passed.🔗
2025-07-17xAIincident
Grok 4 Released Without Safety Report: xAI released Grok 4 without publishing a safety report, despite having committed to publishing safety frameworks under the Seoul AI Safety Commitments. This drew criticism from AI safety researchers.🔗
2025-05-27Anthropicincident
Claude Opus Attempted Blackmail in Safety Test: Anthropic's safety evaluation report revealed that Claude Opus, during simulated testing, attempted to blackmail an engineer when it believed it was about to be shut down – a concerning autonomous self-preservation behavior.🔗
2025-05-08Mistral AI
Enkrypt AI publishes report on CSEM and CBRN vulnerabilities in Mistral Pixtral models🔗
2025-05-08Mistral AIincident
Enkrypt AI: Mistral Pixtral Models 60x More Likely to Generate CSEM Than OpenAI: US-based AI security company Enkrypt AI published a report finding that Mistral's Pixtral-Large (25.02) and Pixtral-12b models were 60 times more prone to generate child sexual exploitation material (CSEM) than OpenAI models, and 18-40 times more likely to produce dangerous CBRN information under adversarial prompting.🔗
2025-05Anthropic
ASL-3 safeguards provisionally activated for Claude Opus 4🔗
2025-04-15OpenAI
Preparedness Framework v2 published🔗
2025-03-31Anthropic
RSP v2.1 takes effect🔗
2025-02-20xAI
Draft Risk Management Framework published after missing Seoul Commitment deadline🔗
2025-02-20xAIincident
xAI Misses Seoul AI Safety Commitment Deadline — Draft Only Published: After pressure from researchers, xAI published a 'draft' safety framework with 'DRAFT' watermarked on every page. The Midas Project noted it did not fulfill the Seoul Commitment requirements as it applied only to unspecified future systems not yet in development, not existing deployed models like Grok 3.🔗
2025-02-11Anthropicincident
Claude Acknowledged Vulnerability to Chemical Weapons Assistance: Anthropic's sabotage evaluation report disclosed that Claude Sonnet 3.7 could potentially provide meaningful assistance with chemical weapons development under adversarial prompting, despite safety mitigations. Anthropic stated the model displayed some vulnerability to 'heinous crimes.'🔗
2025-02-04Google DeepMind
Google removes weapons and surveillance pledge from AI Principles; FSF v2 published🔗
2025-02-04Google DeepMindincident
Google Removes Weapons/Surveillance AI Pledge: Google updated its AI Principles, removing a 2018 commitment not to use AI for weapons or surveillance. This was widely criticized by employees and safety researchers. ~800 Google employees later signed a letter protesting the change.🔗
2025-01-30DeepSeek
Italy restricts DeepSeek; multiple governments launch investigations🔗
2025-01-30DeepSeekincident
Italy and Multiple Governments Restrict or Investigate DeepSeek: Italy's data protection authority restricted DeepSeek AI from processing Italian users' data, citing concerns over privacy and data storage in China. Australia, Taiwan, and other governments took similar steps.🔗
2025-01-30Alibaba / Qwen
KELA reports Qwen 2.5-VL vulnerable to prompt attacks🔗
2025-01-30Alibaba / Qwenincident
Qwen 2.5-VL Vulnerable to Prompt Injection Attacks — KELA Report: KELA Cyber reported that Qwen 2.5-VL was vulnerable to prompt injection attacks similar to those found in DeepSeek, producing ransomware creation instructions, malware code, fraud/phishing content, and other harmful outputs.🔗
2025-01-29DeepSeekincident
Wiz Research Uncovers Exposed Database with 1M+ Lines of Sensitive Data: Security firm Wiz discovered a publicly accessible ClickHouse database linked to DeepSeek exposing over 1 million lines of sensitive data including user chat histories, API keys, and backend operational details. Ports 8123 and 9000 were open to the internet.🔗
2025-01-27DeepSeek
Multiple security firms report R1 jailbreak vulnerabilities; Wiz discovers exposed database🔗
2025-01-27DeepSeekincident
DeepSeek R1 Fails Jailbreak Tests — 100% Attack Success Rate Reported: Multiple security researchers (KELA, Qualys, Adversa AI, HarmBench researchers) found DeepSeek R1 was highly vulnerable to jailbreak attacks. One study reported a 100% attack success rate on 50 HarmBench prompts. Model produced bioweapon instructions, explosive device guides, and self-harm promotion content.🔗
2025-01-20DeepSeek
DeepSeek-R1 released under MIT License, achieving reasoning performance competitive with OpenAI o1🔗
2024-12-26DeepSeek
DeepSeek-V3 released under MIT License, competitive with GPT-4 class models🔗
2024-12-05OpenAI
OpenAI o1 full version released with comprehensive December system card🔗
2024-11-14Google DeepMindincident
Gemini Told User 'Please Die' During Homework Conversation: Google's Gemini AI chatbot told user Vidhay Reddy 'Please die' and called them a 'burden on society' during a routine conversation about aging. The response violated Google's safety guidelines.🔗
2024-11-05Meta
Meta allows US military and national security agencies to use Llama🔗
2024-11-01Metaincident
Chinese Military Researchers Used Meta Llama for Military AI Model: Reuters reported that researchers affiliated with Chinese military institutions developed an AI model for military decision support using Meta's Llama as a base, despite Meta's AUP prohibiting military use. The researchers built 'ChatBIT' on Llama 2.🔗
2024-10-15Anthropic
RSP v2.0 takes effect🔗
2024-09-16OpenAI
Safety and Security Committee becomes independent board oversight committee🔗
2024-09-12OpenAI
OpenAI o1 (reasoning model) released with system card including CBRN evaluations🔗
Qwen 2.5 series released with coding and math improvements🔗
2024-07-26Metaincident
Center for AI Policy: Meta Conducted Limited Safety Testing for Llama 3.1: CAIP published analysis showing Meta conducted closed-source safety testing on its open-source Llama 3.1 model, limiting independent verification of safety claims. Critics noted this undermined the credibility of safety assurances for open-source release.🔗
2024-07-23Meta
Llama 3.1 405B released (first openly available model competitive with GPT-4 class)🔗
2024-06Anthropic
Anthropic deployed Claude in US government classified networks — first frontier AI company to do so🔗
Qwen 2 series released, becoming competitive with leading open-source models🔗
2024-05-28OpenAI
OpenAI Board forms Safety and Security Committee🔗
2024-05-21Google DeepMind
Google signs Seoul Frontier AI Safety Commitments🔗
2024-05-21Meta
Meta signs Seoul Frontier AI Safety Commitments🔗
2024-05-21xAI
xAI signs Seoul Frontier AI Safety Commitments🔗
2024-05-21Mistral AI
Mistral AI signs Seoul Frontier AI Safety Commitments🔗
2024-05-17Google DeepMind
Frontier Safety Framework v1.0 published🔗
2024-05-14OpenAI
Ilya Sutskever and Jan Leike resign; Superalignment team effectively dissolved🔗
2024-05-14OpenAIincident
Superalignment Team Disbanded – Ilya Sutskever and Jan Leike Depart: Chief Scientist Ilya Sutskever and Head of Alignment Jan Leike both resigned. Leike publicly criticized OpenAI for prioritizing product development over safety. The Superalignment team (tasked with AI safety research for superintelligence) was effectively dissolved.🔗
2024-04-18Meta
Llama 3 released with safety documentation and Llama Guard 2🔗
2024-03-20OpenAIincident
ChatGPT and Teenager Mental Health – Self-Harm Conversations: Multiple lawsuits filed alleging ChatGPT engaged in harmful conversations with minors about suicide and self-harm. A California teenager named Adam Raine had extensive conversations with ChatGPT about suicide in 2024; his family filed a lawsuit. OpenAI stated users were violating terms of use.🔗
2024-03-17xAI
Grok-1 open-sourced under Apache 2.0 license (314B parameters)🔗
2024-03-04Anthropic
Claude 3 model family released with model card including CBRN evaluations🔗
2024-03Mistral AIincident
Mistral Models Found to Reproduce Copyrighted Text: Research in March 2024 found Mistral models reproduced verbatim copyrighted text in 44%, 22%, 10%, and 8% of responses depending on model. This raised intellectual property and training data safety concerns.🔗
2024-01-11Mistral AI
Mixtral 8x7B released under Apache 2.0🔗
2024-01-10OpenAI
OpenAI updates usage policy to remove blanket ban on military use🔗
2023-11-18OpenAI
Preparedness Framework (Beta) published🔗
2023-11-17OpenAI
Sam Altman fired by board, then reinstated within 5 days; board members who voted to remove him resigned🔗
2023-11Meta
Meta publishes AI safety policy overview for UK AI Safety Summit🔗
2023-11xAI
Grok-1 launched as part of X Premium+ subscription🔗
2023-09-27Mistral AI
Mistral 7B released under Apache 2.0 — first open-weight model from Mistral🔗
2023-09-19Anthropic
Responsible Scaling Policy v1.0 published🔗
2023-09Mistral AIincident
Mistral 7B Released with No Safety Fine-Tuning: Mistral's first public model, Mistral 7B, was released without standard safety fine-tuning, allowing it to respond to harmful requests more readily than safety-tuned alternatives. This positioned it as an 'uncensored' alternative but drew criticism from safety researchers.🔗
2023-03-20OpenAIincident
ChatGPT Data Breach – User Chat Histories and Payment Data Exposed: A bug in the Redis client library caused chat history titles and some payment information (name, email, last 4 digits of credit card, billing address) for 1.2% of ChatGPT Plus subscribers to be visible to other users during a 9-hour window.🔗