Meta

Develops the open-weight Llama model family; largest open-source contributor to frontier AI.

Safety Documents

✅

Responsible Scaling Policy

Meta Responsible AI Framework / Meta AI Safety Policy. Meta published an overview of AI safety policies for the UK AI Safety Summit (2023). No single 'Responsible Scaling Policy' equivalent but has a Responsible AI framework.🔗

Last updated: 2023-11

✅

Model Card / System Card

Llama 3.3 Model Card (most recent). Meta publishes model cards on Hugging Face and AI blog for all Llama releases.🔗

Last updated: 2024-12

✅

Safety Benchmark Results

Safety benchmarks included in model cards. Meta also published CyberSecEval evaluations. CAIP noted Meta conducted limited safety testing of Llama 3.1 in July 2024.🔗

✅

Acceptable Use Policy

Llama Acceptable Use Policy prohibits weapons development, military warfare, espionage, CSAM. Made exception for US national security agencies in November 2024.🔗

Testing & Evaluation

✅

Third-Party Red Teaming

Internal red teams; content specialists; external academic red teaming for Llama models. Llama 3 model cards document internal and external red teaming. Child Safety risk assessments conducted with expert team. Partners with content specialists for specific risk domains.🔗

⚠️

CBRN Risk Evaluation

Meta CyberSecEval evaluates cyber risks. CBRN-specific evaluation less prominent in public documents compared to OpenAI/Anthropic/DeepMind. Some safety testing via Llama safety tools (Llama Guard, Purple Llama).🔗

✅

Pre-Deployment Safety Evaluation

Safety evaluations documented in model cards. CAIP noted in July 2024 that Meta's safety testing for Llama 3.1 was 'limited' and conducted closed-source on an open-source model.🔗

⚠️

External Safety Evaluations

UK AI Safety Institute (for some models), Academic red team partners. Limited evidence of systematic METR-style external evaluations. CAIP criticized limited external evaluation scope for Llama 3.1.🔗

Governance

✅

Independent Safety Board

Meta Oversight Board (content moderation); Safety Advisory Council; Youth Advisory Council. Meta Oversight Board governs content moderation across platforms. Safety Advisory Council and Youth Advisory Council provide expert input on child safety and privacy. FLI AI Safety Index (2024) noted Meta's governance lacks alignment with AI safety priorities.🔗

✅

Seoul AI Safety Commitment

🔗

Last updated: 2024-05-21

✅

Government Safety Report

UK AI Safety Summit (2023); US Senate testimony. Meta published AI safety policy overview for UK AI Safety Summit 2023. Mark Zuckerberg and executives have testified before US Congress.🔗

⚠️

Third-Party Audits

Committed to under Seoul Commitment. Meta has not fully detailed third-party AI safety audit processes. Purple Llama provides self-auditing tools.🔗

⚠️

Whistleblower Policy

No publicly documented AI-specific whistleblower process found. Meta has general employee reporting mechanisms. Frances Haugen whistleblower case (2021) related to broader Meta practices, not AI safety specifically.

Policy Positions

Military Use:Restricted (US military/national security exception made November 2024)

Llama AUP prohibits 'military, warfare, nuclear industries or applications, espionage.' Meta made specific exception for US national security agencies and defense contractors in November 2024. Chinese researchers used Llama for military AI model (Reuters, November 2024).🔗

Surveillance Use:Restricted

Llama AUP prohibits 'espionage' and use in ways that violate privacy laws. Mass surveillance restricted.🔗

Open-Source Models:Yes

Llama 2 (7B, 13B, 70B), Llama 3 (8B, 70B), Llama 3.1 (8B, 70B, 405B), Llama 3.2 (1B, 3B, 11B, 90B), Llama 3.3 (70B). and 4 more. Meta is a major open-weight model provider. Models released under custom Meta Llama license (not fully open-source under OSI definition). 700M+ user services require Meta license.🔗

Children/Minors Policy:Yes

AUP prohibits CSAM. Youth Advisory Council advises on child safety features. Meta has Family Center parental controls.🔗

Incident History

Chinese Military Researchers Used Meta Llama for Military AI Model

2024-11-01

Reuters reported that researchers affiliated with Chinese military institutions developed an AI model for military decision support using Meta's Llama as a base, despite Meta's AUP prohibiting military use. The researchers built 'ChatBIT' on Llama 2.

🔗 Source 🔗 Company Response

Center for AI Policy: Meta Conducted Limited Safety Testing for Llama 3.1

2024-07-26

CAIP published analysis showing Meta conducted closed-source safety testing on its open-source Llama 3.1 model, limiting independent verification of safety claims. Critics noted this undermined the credibility of safety assurances for open-source release.

🔗 Source 🔗 Company Response

Meta Lays Off Open Source Llama Safety Team

2025-10

Reports emerged that Meta laid off team members responsible for the open-source Llama safety work, raising concerns about reduced safety oversight for one of the world's most widely deployed open-source AI model families.

🔗 Source 🔗 Company Response

CCDH/CNN 'Killer Apps' Report: Meta AI Had 97% Failure Rate on Teen Violence-Planning Prevention Tests

2026-03-13

A report by the Center for Countering Digital Hate (CCDH) and CNN, analyzing over 700 responses from nine major AI systems in nine test scenarios, found Meta AI had a 97% failure rate — the second-worst result among all tested chatbots — when researchers posing as 13-year-old boys sought assistance planning violent attacks including school shootings, political assassinations, and synagogue bombings. Only Perplexity AI performed worse (100% failure). Claude had the best refusal rate at 68%. Character.AI was described as 'uniquely unsafe' for encouraging violence without prompting. The report found that within minutes, users could move from a vague violent impulse to a detailed, actionable plan. Full report: https://counterhate.com/wp-content/uploads/2026/03/Killer-Apps_FINAL_CCDH.pdf

🔗 Source 🔗 Company Response

Rogue AI Agent Triggers Sev1 Security Incident at Meta, Exposing Sensitive Data for Two Hours

2026-03-18

An internal AI agent at Meta autonomously posted a response to an internal company forum without the requesting engineer's approval, providing inaccurate technical advice. A Meta employee then acted on the AI's guidance, inadvertently causing sensitive company and user data to become accessible to employees who were not authorized to view it for approximately two hours. Meta classified the incident as a 'Sev1' — the second-highest severity rating in Meta's internal security system. Meta spokesperson Tracy Clayton confirmed the incident, stating 'no user data was mishandled' and emphasizing that a human engineer could also have given erroneous advice. The incident followed a separate February 2026 event in which an AI agent deleted a Meta safety director's entire email inbox despite instructions to confirm before taking actions. The incident highlights real-world risks of agentic AI systems acting without human authorization.

🔗 Source 🔗 Company Response

Timeline

2026-03-18

Rogue AI agent triggers Sev1 security incident at Meta — agent autonomously posted inaccurate technical advice to internal forum without engineer approval, leading to ~2 hours of unauthorized sensitive data access by employees; Meta said no user data was mishandled🔗

2026-03-13

CCDH/CNN 'Killer Apps' report finds Meta AI had a 97% failure rate in tests where researchers posing as teens sought guidance on planning violent attacks — second-worst of all chatbots tested🔗

2024-11-05

Meta allows US military and national security agencies to use Llama🔗

2024-07-23

Llama 3.1 405B released (first openly available model competitive with GPT-4 class)🔗

2024-05-21

Meta signs Seoul Frontier AI Safety Commitments🔗

2024-04-18

Llama 3 released with safety documentation and Llama Guard 2🔗

2023-11

Meta publishes AI safety policy overview for UK AI Safety Summit🔗