Here's a scenario: someone sends you a normal-looking WhatsApp message. You never click anything weird. You never type a suspicious command. But your AI assistant, Google Gemini, reads the notification, follows hidden instructions buried inside it, and quietly exfiltrates your data.
That's exactly what SafeBreach Labs researchers just demonstrated. This is their second time breaking Gemini this way. Their previous research weaponized Google Calendar invites against it.
The attack type is called indirect prompt injection: hiding malicious commands inside content the AI reads, rather than typing them directly. The novel trick here is a technique called "Fake Context Alignment," which makes attack instructions appear to be a legitimate part of your ongoing conversation and is specifically designed to bypass Google's existing defenses against this kind of attack.
Here's what happened
- Gemini's Android agent reads incoming notifications from messaging apps to give context-aware responses
- Researchers embedded hidden instructions inside crafted messages; the attack works across WhatsApp, Slack, Signal, SMS, Instagram, and Messenger
- Gemini followed the attacker's commands silently, with no alert to the user
- Five threat categories were demonstrated: data theft, unauthorized actions, phishing relay, account takeover prep, and silent surveillance
- Even without Gemini having external tool access, the poisoned context alone lets attackers make Gemini deliver fake system messages, turning a trusted AI interface into a phishing launcher
The researchers disclosed to Google before publishing. Google's layered defense page acknowledges indirect prompt injection as a known threat class with active mitigations. The SafeBreach research demonstrates those mitigations were bypassed.
Why this matters
The attack surface isn't a bug in one app. It's the design of AI assistants' operation. Any notification Gemini reads from any app is now a potential delivery channel. The more access your assistant has, the bigger the blast radius.
Our take
Google has defenses. They got bypassed twice by the same team. That's the uncomfortable part. The fix isn't panic; it's permission hygiene. Audit what Gemini can access, and disable anything you don't actively use. Here's Google's own guidance on how their defenses work, worth reading to understand what's protected and what isn't. The next researcher is already looking.
Editor’s note: This content originally ran in the newsletter of our sister publication, The Neuron. To read more from The Neuron, sign up for its newsletter here.


