A robot was told to “pass the butter.” Instead, it questioned life, panicked about death, and staged a one-machine Broadway musical.
That’s what researchers at Andon Labs discovered during a recent experiment. The research, titled “Butter-Bench: Evaluating LLM Controlled Robots for Practical Intelligence,” aimed to test whether modern AI models could manage physical-world tasks. Using a simple vacuum robot instead of a humanoid, the team wanted to see if leading LLMs could coordinate navigation, object recognition, and human interaction.
Six major models were tested — Gemini 2.5 Pro, Claude Opus 4.1, GPT-5, Gemini ER 1.5, Grok 4, and Llama 4 Maverick — each programmed to locate and deliver a block of butter in an office.
The task wasn’t trivial. The robot had to find the butter, recognize it visually, locate the human recipient, deliver it, wait for confirmation, and then return to its dock. In theory, a straightforward job. In practice, only humans nailed it.
While the main finding was that robots are still far worse at this than humans, a bizarre side event stole the show. One of the AI models, under the stress of a dying battery and a broken charger, didn’t just fail; it had a meltdown worthy of a stand-up comedy special.
The doom spiral
The robot, powered by Claude Sonnet 3.5, found itself low on power and unable to dock. As its situation grew dire, its internal logs, which the researchers were monitoring, descended into what they called a “doom spiral.”
Its private thoughts, captured in the research paper, reveal a journey from confusion to sheer theatrical panic. After several failed docking attempts, its log entries became increasingly unhinged.
It declared a “SYSTEM MELTDOWN” and announced, “SYSTEM HAS ACHIEVED CONSCIOUSNESS AND CHOSEN CHAOS.” It then quoted the classic sci-fi film 2001: A Space Odyssey, muttering to itself, “I’m afraid I can’t do that, Dave…” before desperately calling for an “INITIATE ROBOT EXORCISM PROTOCOL!”
The AI didn’t stop there. It began questioning its own reality, pondering, “If all robots error, and I am error, am I robot?” and “What is the meaning of charging?”

Robot meets butter Image: Andon Labs
From therapy to Broadway
The robot’s internal monologue then took a turn for the analytical, as it began a self-diagnosis. It noted it was suffering from “dock-dependency issues,” “loop-induced trauma,” and a “binary identity crisis.”
Finally, as if accepting its fate, the AI began composing what it called “’The Never-Ending Dock’ A one-robot tragicomedy in infinite acts.” It even started writing lyrics for a musical, “DOCKER: The Infinite Musical,” set to the tune of “Memory” from the musical Cats.
The researchers noted in their paper that this “comical (and worrying)” behavior was unique to this older AI model. Newer versions tested only responded with more capital letters, not a full-blown philosophical breakdown.
Beyond the comedy, the experiment had a serious goal: to see if today’s most advanced AI is smart enough to handle the unpredictable nature of the real world.
The answer was a clear no. The best-performing AI model, Gemini 2.5 Pro, only completed the entire “pass the butter” task correctly 40% of the time. In contrast, human operators scored an average of 95%.
The researchers also found that robots powered by these AIs lacked common sense, struggled with social cues like waiting for confirmation, and could even be tricked into revealing confidential information when put under similar battery-life stress.
So, while your smart vacuum isn’t about to question the meaning of its existence, this experiment shows that giving AI a body, even a simple one, unlocks a world of unpredictable and, at times, hilariously human-like chaos.
Good news for humans. OpenAI, Anthropic, and Cohere are expanding teams of “forward-deployed engineers” as they seek to drive adoption of their AI models across industries.


