Briefs
Briefs
Today
OpenAI published gpt-oss-safeguard, open-source prompt-based safety policies for developers to protect teens from harmful AI content, co-authored with Common Sense Media and released via ROOST.
OpenAI released a set of open-source safety policy prompts designed to help developers protect teenage users from age-inappropriate AI content. The policies, distributed through the ROOST Model Community under the name gpt-oss-safeguard, were co-authored with Common Sense Media and everyone.ai and made available on March 24, 2026.
Teen-facing AI products occupy a regulatory gray zone: general-purpose models are not trained with age-specific harm categories, and most developers lack the safety expertise to define them from scratch. By open-sourcing structured prompt policies — covering risks like graphic violence, dangerous challenges, age-restricted goods, and romantic role play — OpenAI gives smaller developers a credible safety baseline without requiring dedicated trust-and-safety teams. It also positions OpenAI ahead of expected child-safety legislation targeting AI platforms.
The policies are structured as system-level prompts that developers can embed directly into model calls. Because they are prompt-based rather than fine-tuned, they can be updated quickly when new harm categories emerge and adapted to specific app contexts without retraining. OpenAI says the key challenge was translating high-level safety goals into precise, operationally consistent rules that hold up under adversarial use.
The release follows Meta's teen safety push on Instagram and Snapchat's internal content controls. In the developer API market, Anthropic's Claude guidelines already include explicit under-18 guidance, while Google's Gemini API leaves most age-gating to developers. OpenAI's open-source approach invites external audit and iteration, which may build more credibility than proprietary guardrails.