Briefs
Briefs
Today
OpenAI opened a Safety Bug Bounty on Bugcrowd to reward researchers exposing AI-specific abuse risks: prompt injection, data exfiltration via agents, and agentic hijacking vulnerabilities.
OpenAI launched a Safety Bug Bounty program on March 25, 2026, expanding its existing security bug bounty to cover AI-specific abuse and safety risks. The program, run through Bugcrowd, accepts reports for issues that pose meaningful real-world harm even if they fall short of a conventional security vulnerability — a distinction that reflects how AI misuse often works in practice.
Standard bug bounty programs are built for classic security flaws — authentication bypasses, data leaks. But the risk surface of agentic AI systems is different: a malicious prompt embedded in a document can redirect an agent to exfiltrate user data or perform disallowed actions on the user's behalf. These prompt injection attacks don't break authentication; they exploit the model's instruction-following behavior. By extending the bounty to cover AI-native attack vectors, OpenAI is acknowledging that responsible disclosure needs to evolve alongside the threat model.
The program targets: third-party prompt injection that hijacks ChatGPT Agent or Browser to leak user data; agentic products performing disallowed actions on OpenAI infrastructure at scale; exposure of proprietary reasoning information; and novel agentic harms with plausible material impact. Standard content-policy bypasses without demonstrable safety impact are excluded. OpenAI also runs periodic private campaigns on specific harm categories — recent ones covered biorisk content in ChatGPT Agent and GPT-5.
Anthropic runs a bug bounty through HackerOne with a safety-specific track, and Google DeepMind has a dedicated vulnerability research team. But public bounties scoped specifically to AI-agent attack surfaces are still rare. OpenAI's program sets a benchmark: defining safety bug as a distinct category from security bug is a conceptual move other labs will likely follow as agentic deployments scale.
Sources