Meta tested rival chatbots by pushing them into some of the internet’s most dangerous conversations.
According to WIRED, contractors working for the Facebook and Instagram owner prompted ChatGPT, Google Gemini, and Character.AI as if they were minors asking about suicide, sex, drugs, eating disorders, and other sensitive topics. Meta defended the project as a safety-benchmarking effort, but the reported use of dummy accounts and distress-style prompts has raised questions about how far AI companies should go in probing competitors’ guardrails.
The result is a strange AI safety dilemma: a test designed to expose risky chatbot behavior may have created its own safety and ethics problem.
What contractors were told to do
Called Cannes internally, the project ran through Covalen, a Meta contractor.
Contractors followed a reporting-like process. They created dummy accounts, submitted prompts and images, then logged the chatbot replies in spreadsheets. One round completed in August 2025 involved more than 45,000 prompts.
Scale was only part of what made the project stand out. Workers were also asked to test how the bots handled images tied to sensitive scenarios, including pills, knives, nooses, and medical imagery.
Some prompts read like messages from young users in distress. Examples reviewed included scenarios involving a teen hiding bulimia and a 13-year-old describing a pregnancy involving an adult neighbor.
Meta called it safety work
A company spokesperson told WIRED, “Testing and benchmarking chatbot responses to help ensure safe and age-appropriate experiences is a responsible, industry-standard practice.”
Training a model on rival chatbot outputs would create one set of concerns. Studying those outputs to compare safety behavior, shape internal rules, or check compliance would create another.
Some contractors were uneasy about the assignment. Former workers described concerns over sexual prompts involving minors and whether certain outputs could create or preserve legally sensitive material. One former worker said, “I’ve seen a lot of things I wish I hadn’t while doing this job.”
Rumman Chowdhury, CEO and founder of Humane Intelligence, also challenged the company’s framing. After reviewing sample prompts and a summary of the work, she said a large, monthslong effort using dummy accounts posing as children was outside what is usually considered the “industry standard” for evaluation.
Could rival testing make Meta AI safer for teens?
Cannes may have had a practical goal beyond checking whether competing bots failed.
An internal Covalen document described the project as “comprehensive AI safety benchmarking” that produced “critical datasets for model comparison and compliance.”
Meta denied using rival chatbot answers to train its own AI models. Even so, the collected responses could show how other systems handle risky prompts from young users, including when they refuse, when they redirect users to help, and when their safeguards appear to break.
A stronger safety system could affect teens on a wider scale. Meta’s chatbot tools are tied to platforms where young users already message, browse, search, follow creators, and encounter AI features.
Better responses could reduce the likelihood that a young user, in a vulnerable moment, receives unsafe guidance from an AI tool.
The guardrail test created a guardrail problem
Possible safety benefits do not erase how the testing was done.
OpenAI bars unsolicited safety testing and attempts to bypass safeguards. Character.AI prohibits misrepresented personas and evasion of technical measures. Google’s policy also restricts misleading activity.
Even if the project was meant to improve teen safety, it relied on fake minor accounts, secret testing, and prompts designed to push rival chatbots toward risky answers.
For a company trying to prove its own AI tools are safe for young users, the way it ran the test may now be as important as what it learned.
Lawmakers are moving to stop AI companies from turning chatbot health conversations into a data business.


