Google, Oxford Researchers Study Way to Stop AI Systems From Misbehaving

Google, Oxford Researchers Study Way to Stop AI Systems From Misbehaving

artificial intelligence
Jun 8, 2016
2 minute read
eWeek content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More

Researchers from Google’s DeepMind group and at the University of Oxford are collaborating on a project to develop what amounts to a kill switch for preventing artificial intelligence systems from going rogue, like the iconic HAL 9000 system in “2001: A Space Odyssey.”

A paper, titled “Safely Interruptible Agents,” describing the effort will be presented at the upcoming Conference on Uncertainty in Artificial Intelligence in New York later this month.

The paper, according to its authors Laurent Orseau from Google and Stuart Armstrong from Oxford University, explores a method for ensuring that super-intelligent artificial intelligence (AI) systems of the future will not be capable of learning how to prevent interruption by a human operator.

Reinforcement learning agents, or computing systems, which learn independently from past behaviors and actions, will not behave in an optimal fashion all the time, the two researchers noted in their paper. There can be situations where an AI system may start exhibiting behavior, which requires a “human operator to press the big red button” to stop escalation of that behavior, they wrote in the paper.

As one example, the two researchers pointed to a situation in which a robot is programmed to carry boxes from outside a warehouse and put them indoors or to stay indoors and sort boxes. If the robot is shut down and carried inside the building every time it begins to rain while it is working outside, the robot will eventually learn to simply stay inside the warehouse and sort boxes rather than venturing outside. The constant human intervention, whenever it rains, causes a bias in the system and causes it to modify its behavior, the researchers said.

The method described in the research paper explores a way to ensure that the robot is unaware of any interruptions or that it functions on the assumption that such interruptions will never happen again, Orseau and Armstrong said.

“Reinforcement learning (RL) agents learn to act so as to maximize a reward function,” the researchers noted in their paper. If not properly designed, such agents, or AI systems may take “unpredictable and undesirable” shortcuts. The paper describes a method where a human operator can safely interrupt the operation of an AI system while also ensuring that the system will not learn how to prevent such interruptions.

The paper by Orseau and Armstrong offers a new way to design an AI system that is capable of recognizing flaws in its operation and assisting humans in addressing the issue rather than blocking it, the Machine Intelligence Research Institute (MIRI) noted in a blog post. Such “corrigibility” is critical to ensuring the safe operation of AI systems the paper noted.

Safely interruptible agents, such as the one that the researchers have described in their paper, “are indifferent to programmers’ interventions to modify their policies and will not try to stop programmers from intervening on their everyday activities,” MIRI said.

eWeek Logo

eWeek has the latest technology news and analysis, buying guides, and product reviews for IT professionals and technology buyers. The site's focus is on innovative solutions and covering in-depth technical content. eWeek stays on the cutting edge of technology news and IT trends through interviews and expert analysis. Gain insight from top innovators and thought leaders in the fields of IT, business, enterprise software, startups, and more.

Property of TechnologyAdvice. © 2026 TechnologyAdvice. All Rights Reserved

Advertiser Disclosure: Some of the products that appear on this site are from companies from which TechnologyAdvice receives compensation. This compensation may impact how and where products appear on this site including, for example, the order in which they appear. TechnologyAdvice does not include all companies or all types of products available in the marketplace.