Researchers from Google’s DeepMind group and at the University of Oxford are collaborating on a project to develop what amounts to a kill switch for preventing artificial intelligence systems from going rogue, like the iconic HAL 9000 system in “2001: A Space Odyssey.”
A paper, titled “Safely Interruptible Agents,” describing the effort will be presented at the upcoming Conference on Uncertainty in Artificial Intelligence in New York later this month.
The paper, according to its authors Laurent Orseau from Google and Stuart Armstrong from Oxford University, explores a method for ensuring that super-intelligent artificial intelligence (AI) systems of the future will not be capable of learning how to prevent interruption by a human operator.
Reinforcement learning agents, or computing systems, which learn independently from past behaviors and actions, will not behave in an optimal fashion all the time, the two researchers noted in their paper. There can be situations where an AI system may start exhibiting behavior, which requires a “human operator to press the big red button” to stop escalation of that behavior, they wrote in the paper.
As one example, the two researchers pointed to a situation in which a robot is programmed to carry boxes from outside a warehouse and put them indoors or to stay indoors and sort boxes. If the robot is shut down and carried inside the building every time it begins to rain while it is working outside, the robot will eventually learn to simply stay inside the warehouse and sort boxes rather than venturing outside. The constant human intervention, whenever it rains, causes a bias in the system and causes it to modify its behavior, the researchers said.
The method described in the research paper explores a way to ensure that the robot is unaware of any interruptions or that it functions on the assumption that such interruptions will never happen again, Orseau and Armstrong said.
“Reinforcement learning (RL) agents learn to act so as to maximize a reward function,” the researchers noted in their paper. If not properly designed, such agents, or AI systems may take “unpredictable and undesirable” shortcuts. The paper describes a method where a human operator can safely interrupt the operation of an AI system while also ensuring that the system will not learn how to prevent such interruptions.
The paper by Orseau and Armstrong offers a new way to design an AI system that is capable of recognizing flaws in its operation and assisting humans in addressing the issue rather than blocking it, the Machine Intelligence Research Institute (MIRI) noted in a blog post. Such “corrigibility” is critical to ensuring the safe operation of AI systems the paper noted.
Safely interruptible agents, such as the one that the researchers have described in their paper, “are indifferent to programmers’ interventions to modify their policies and will not try to stop programmers from intervening on their everyday activities,” MIRI said.