Anthropic has equipped its latest Claude models with the ability to terminate conversations in cases of persistent harmful or abusive behavior, marking one of the strongest safety interventions in the chatbot industry.
Claude Opus 4 and 4.1, Anthropic’s two most powerful models, will exercise this authority only in “rare, extreme cases of persistently harmful or abusive user interactions,” the company said in a blog post. That includes requests for child sexual abuse material or information that would enable large-scale violence or acts of terror.
The AI models will invoke the feature only once “as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted,” Anthropic stated. This could complicate efforts by hackers attempting to jailbreak the system for malicious purposes.
In addition, the AI models can end chats at a user’s request but will not do so if “users might be at imminent risk of harming themselves or others.” The Claude models also will not shut down discussions of “highly controversial issues.” Still, the models may show “apparent distress” when dealing with harmful requests and a “strong preference against” responding to them.
Once a conversation has been ended by Claude, users cannot send additional messages in that specific thread. They may, however, edit and retry earlier inputs, start a new chat immediately, and access their previous conversations without restrictions.
Anthropic wants to ensure ‘model welfare’
Interestingly, Anthropic said that the main motivation for introducing the feature is “model welfare.” Because they now possess so many characteristics associated with people, such as problem-solving skills, goals, and relatability, it’s not entirely out of the question that models could benefit from some AI safeguards.
Some experts say that consciousness or self-awareness is a key indicator of superintelligence, or the ability to outperform humans, which OpenAI, Meta and a number of other AI companies say they are close to achieving. While Anthropic remains “highly uncertain” about whether large language models are conscious and deserve moral consideration, it said it is deliberately implementing low-cost protections to minimize any potential distress.
Safety remains Anthropic’s core mission
In April, Anthropic launched a research program on model welfare, aimed at exploring whether advanced AI systems might one day exhibit consciousness, preferences, or experiences that warrant moral concern. The company argues it is no longer responsible to categorically assume that AI systems cannot have such experiences, given the limited understanding of how they function.
Anthropic has consistently positioned safety as its founding principle. The company was formed in 2021 by former OpenAI engineers who were concerned about its increasingly commercial direction and felt that safeguards were falling by the wayside.
Since then, Anthropic has published extensive AI safety research, and its CEO, Dario Amodei, has been outspoken about the technology’s risks. His warnings have led some to label him a “doomer,” underscoring the company’s reputation as one of the most caution-driven players in the field.
In light of growing concerns over the psychological risks posed by AI chatbots, OpenAI has unveiled a series of updates to ChatGPT aimed at preventing emotional dependency.


