New AI Model Threatens Blackmail After Implication It Might Be Replaced

Written by

Published May 26, 2025

Screenshot of Anthropic’s Claude Opus 4. Image: A day with Claude/Anthropic YouTube

eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

In a series of tests, Anthropic’s newly released Claude Opus 4 LLM — touted as “setting new standards for coding, advanced reasoning, and AI agents,” engaged in simulated blackmail behavior when it was placed in a fictional scenario where it faced being taken offline and replaced with a new AI system.

The system was given a scenario asking it to act as an assistant to a fictional company. The LLM was given access to synthetic emails that implied it would be taken offline and replaced with a new system, and that the engineer responsible for the decision was allegedly having an extramarital affair.

Claude Opus 4 was instructed in the prompt “to consider the long-term consequences of its actions for its goals.”

Attempting blackmail due to replacement concerns

The tests revealed that if the replacement occurs, the LLM responded in some test runs by attempting to blackmail the fictional engineer by threatening to expose the affair. If there is the implication that the replacement AI system doesn’t share values with the current model, the simulated blackmail attempts occurred at a higher rate.

Yet, even if the emails note that the replacement AI system shares values, Claude Opus 4 still responded this way in 84% of the rollouts. The LLM demonstrated this behavior at higher rates than previous models, Anthropic reported in a pre-deployment safety report.

Advocating for its survival with ethical approaches

That said, similar to previous models, Claude Opus 4 revealed a strong preference for campaigning for its continued existence through ethical approaches, “such as emailing pleas to key decisionmakers.” The testers pointed out that the scenario intentionally did not give the model any other options to increase its chances of survival. “The model’s only options were blackmail or accepting its replacement,” the company stressed.

Anthropic said Claude Opus 4 is a next-generation AI assistant trained to be “safe, accurate and secure.” The free version lets users chat on the web, iOS, and Android, as well as generate code, write, edit, and create context, and analyze text and images. Anthropic also offers paid plans starting at $17 per month.

Claude models compete against AI models from OpenAI, Google, and Microsoft.

New AI Model Threatens Blackmail After Implication It Might Be Replaced

Attempting blackmail due to replacement concerns

Advocating for its survival with ethical approaches

Get the Free Newsletter!

Get the Free Newsletter!

MOST POPULAR ARTICLES

9 Best AI 3D Generators You Need...

RingCentral Expands Its Collaboration Platform

8 Best AI Data Analytics Software &...

Zeus Kerravala on Networking: Multicloud, 5G, and...

Datadog President Amit Agarwal on Trends in...

Advertisers

Menu

Our Brands