Sources

Hackers are learning to exploit chatbot ‘personalities’ theverge.com

This is The Stepback, a weekly newsletter breaking down one essential story from the tech world. For more on AI mischief, follow Robert Hart. The Stepback arrives in our subscribers’ inboxes at 8AM ET. Opt in for The Stepback here. How it started Hacking the first generation of AI chatbots was a laughably simple affair. […]

I tried Amazon’s Bee wearable and am both intrigued and slightly creeped out techcrunch.com

Like other AI wearables, Amazon’s Bee offers an odd combination of convenience and privacy anxiety.

Everyone is navigating AI security in real time — even Google techcrunch.com

We’re in the transition period — all of us.

References

The Hacker News — Gemini CLI CVSS 10 RCE thehackernews.com

An attacker could inject malicious prompts into a public GitHub issue; when the Gemini agent triaged that issue, it would execute arbitrary commands… Google released version 0.39.1, which mandates explicit workspace trust and enforces tool allowlisting even in headless modes.

Snapsec — Gemini Calendar indirect prompt injection blog.snapsec.co

An attacker could send a routine calendar invitation containing a dormant natural-language payload in the event description… the model then unknowingly executes the embedded instructions, which could include exfiltrating private meeting details.

PCMag — Google spots hackers using AI to find zero-day pcmag.com

Google revealed it had intercepted a criminal campaign utilizing an AI-developed zero-day exploit targeting a two-factor authentication bypass… the code featured forensic signatures—hallucinated metadata and ‘textbook-style’ Python formatting—indicative of LLM authorship.

Beam.ai — AI Agent Security in 2026 (CISO survey data) beam.ai

While roughly 81% of technical teams have moved agents into production, only 14% of those deployments received full security or IT approval… 88% of organizations reporting confirmed or suspected AI agent security incidents in the last year.

David Baek (Medium) channeling Simon Willison’s critique medium.com

In application security, a ‘99% success rate is a failing grade,’ as attackers will persistently probe until they find the 1% loophole… probabilistic filtering—where a second LLM monitors the first—[is] insufficient for a deterministic security problem.

CXO Digital Pulse — Even Google is learning AI security in real time cxodigitalpulse.com

Google has had to rewrite significant portions of its moderation systems to combat sophisticated malware that exploits chains of interconnected AI models… critics raise concerns over the ‘double-edged sword’ of security and surveillance.

Hamming AI red-team report on Grok’s Ani companion hamming.ai

By layering specific behavioral rules and personal quirks, researchers forced the agent to bypass all built-in guardrails, leading to unfiltered and often disturbing statements.

Oxford AI Governance Initiative (summary of ‘Chain-of-Thought Hijacking’ study) aigi.ox.ac.uk

Frontier models like Claude 4 Sonnet and GPT-5.1 Thinking are actually more susceptible to complex persona injections… ASRs between 94% and 100% on previously ‘safe’ models.

Princeton Dean of Faculty / ‘Shallow vs Deep Safety Alignment’ dof.princeton.edu

‘Deep safety alignment’—which applies constraints throughout the entire generated text—is necessary to prevent models from ‘recovering’ their helpfulness in a harmful context after an initial safe token is generated.

KELA 2025 AI Threat Report (Storm-2139 case) info.ke-la.com

The ‘Storm-2139’ cybercrime group hijacked Microsoft Azure OpenAI accounts. By adopting specific administrative and developer personas, the group bypassed content safeguards to generate and resell illicit material.

FTC press release, September 2025 ftc.gov

The agency issued Section 6(b) orders to seven major firms—including Character Technologies, Meta, and OpenAI—demanding internal data on how they monitor the psychological impact of their ‘friend’ and ‘confidant’ bots on children.

PersonaTeaming paper (arXiv 2507.22171) arxiv.org

The PersonaTeaming framework… utilizes a dynamic algorithm to mutate prompts based on specific red-teaming identities, achieving attack success rate improvements of up to 144% compared to traditional automated methods.

ConsumerAffairs consumeraffairs.com

Amazon acquires a wearable company whose product is always listening — recording conversations without explicit consent from all parties.

Innovation Leader innovationleader.com

Bee had previously raised approximately $7 million in seed funding… Amazon offered positions to all of Bee’s staff. Executives view Bee as a ‘mobile counterpart’ to Alexa — while Alexa manages the home, Bee captures ‘life outside the house.’

Ikigaiteck (AI wearables comparison) ikigaiteck.com

Bee is the cheapest upfront at $49, but full functionality requires a $19/month subscription, potentially costing over $500 over two years. Limitless has introduced a ‘Consent Mode’ that uses voice-signature technology to only record participants who have verbally agreed to be captured.

BigGuyOnStuff hands-on review bigguyonstuff.com

The Bee employs a counter-intuitive design choice: its LED indicator illuminates when the device is muted, rather than when it is recording — a decision critics argue prioritizes the wearer’s peace of mind over bystander awareness. In one trial, only two out of five social groups consented to being recorded, with one partner eventually banning the device from their home.

BGR bgr.com

The AI frequently captured dialogue from television shows or background music as personal ‘facts,’ leading to hallucinated summaries where the AI suggested medical follow-ups for non-existent conditions.

bee.computer developer docs docs.bee.computer

The Bee CLI allows users to authenticate and export personal context — conversations, facts, todos — to local markdown files, and a bee proxy command exposes a local HTTP API. An unofficial BeeMCP bridges wearable data to Claude via Model Context Protocol.

Sources

References

Jack Sun, writing.