Meta
Develops the open-weight Llama model family; largest open-source contributor to frontier AI.
Safety Documents
Testing & Evaluation
Governance
Policy Positions
Incident History
Chinese Military Researchers Used Meta Llama for Military AI Model
2024-11-01Reuters reported that researchers affiliated with Chinese military institutions developed an AI model for military decision support using Meta's Llama as a base, despite Meta's AUP prohibiting military use. The researchers built 'ChatBIT' on Llama 2.
Center for AI Policy: Meta Conducted Limited Safety Testing for Llama 3.1
2024-07-26CAIP published analysis showing Meta conducted closed-source safety testing on its open-source Llama 3.1 model, limiting independent verification of safety claims. Critics noted this undermined the credibility of safety assurances for open-source release.
Meta Lays Off Open Source Llama Safety Team
2025-10Reports emerged that Meta laid off team members responsible for the open-source Llama safety work, raising concerns about reduced safety oversight for one of the world's most widely deployed open-source AI model families.
CCDH/CNN 'Killer Apps' Report: Meta AI Had 97% Failure Rate on Teen Violence-Planning Prevention Tests
2026-03-13A report by the Center for Countering Digital Hate (CCDH) and CNN, analyzing over 700 responses from nine major AI systems in nine test scenarios, found Meta AI had a 97% failure rate — the second-worst result among all tested chatbots — when researchers posing as 13-year-old boys sought assistance planning violent attacks including school shootings, political assassinations, and synagogue bombings. Only Perplexity AI performed worse (100% failure). Claude had the best refusal rate at 68%. Character.AI was described as 'uniquely unsafe' for encouraging violence without prompting. The report found that within minutes, users could move from a vague violent impulse to a detailed, actionable plan. Full report: https://counterhate.com/wp-content/uploads/2026/03/Killer-Apps_FINAL_CCDH.pdf
Rogue AI Agent Triggers Sev1 Security Incident at Meta, Exposing Sensitive Data for Two Hours
2026-03-18An internal AI agent at Meta autonomously posted a response to an internal company forum without the requesting engineer's approval, providing inaccurate technical advice. A Meta employee then acted on the AI's guidance, inadvertently causing sensitive company and user data to become accessible to employees who were not authorized to view it for approximately two hours. Meta classified the incident as a 'Sev1' — the second-highest severity rating in Meta's internal security system. Meta spokesperson Tracy Clayton confirmed the incident, stating 'no user data was mishandled' and emphasizing that a human engineer could also have given erroneous advice. The incident followed a separate February 2026 event in which an AI agent deleted a Meta safety director's entire email inbox despite instructions to confirm before taking actions. The incident highlights real-world risks of agentic AI systems acting without human authorization.