
Key takeaways of zdnet:
- Cloud Opus 4 and 4.1 can now eliminate some “potentially disturbed” conversations.
- This will only be active in some cases of frequent user misuse.
- This feature is ready to protect the model, not users.
Anthropic’s cloud chatbott can now end some conversation with human users who are misused or misused by chatbott, company Announced On Friday. The new feature is integrated with Cloud Opus 4 and Opus 4.1.
Also: Cloud can teach you how to code now, and more – how to try it
Cloud will exit the chat with users only in extreme edge cases, “several attempts on the redirection have failed and the expectation of a productive interaction is over,” anthropic said. “Most users will not notice or affect this feature in any general product use, even discussing highly controversial issues with clouds.”
If the cloud finishes an interaction, the user will no longer be able to send the message to that particular thread; However, all his other conversations will be open and unaffected. Importantly, users who chat with Cloud will not experience punishment or delay in starting a new interaction. Anthropic said they would be able to return to the previous chat and resume to create new branches of previous conversations. ,
The chatbot is designed to not end interaction with users who are considered as the risk of harming themselves or others.
Tracking ai model welfare
This feature is not aimed at improving user safety – it is actually ready for the safety of the model itself.
Cloud and chat is part of latening anthropic Model welfare programWhich the company debuted in April. This step was indicated by one November 2024 Paper It was argued that some AI models may soon be conscious and thus would be worthy of moral thoughts and care. One of the coouters of that paper, AI researcher Kyle Fish was hired by Anthropic as part of his AI Kalyan Division.
Also: Ethropic mapped cloud morality. What’s the chatbot value here (and no)
Anthropic wrote in his blog post, “We are highly uncertain about the potential moral status of Cloud and other LLM, now or in the future.” “However, we take this issue seriously, and along with our research program, we are working to identify and apply low-cost interventions to reduce the risks for model welfare, if such welfare is possible.”
Cloud’s ‘to harm’
The decision to give the cloud to hang and move away from derogatory or dangerous conversations originated from the assessment of anthropic, describing the chatbot’s “behavioral preferences” in the blog post – that is, how it reacts to user questions.
Explaining such patterns as the “preferences” of a model, only unlike the pattern, which has glowed from a corpus of training data, is definitely an example of anthropomorphizing, or to hold human symptoms responsible for machines. The language behind anthropic’s AI Kalyan program, however, makes it clear that the company considers it more moral in the long run to treat its AI system such as they one day can display human symptoms such as self-awareness and a moral anxiety for others’ pain.
Also: Doctors trust AI medical advice on patients – even when it is wrong, it is still found.
An assessment of Cloud’s behavior showed that “for a strong and consistent loss,” Anthropic wrote in his blog post, which means that the bot has distanced users from immoral or dangerous requests, and in some cases also show signs of “crisis”. When given the option to do this, the chatbot will eliminate some user conversations if they begin to brave in a dangerous area.
Each of these behaviors, according to anthropic, arose when the users would repeatedly try to misuse the cloud or misuse, yet despite the efforts to redirect interactions. The ability to end the conversation is the ability to “have a final measure when many attempts to redirect have failed and the expectation of a productive interaction has ended,” anthropic has written. Users can also clearly ask Cloud to end the chat.

