Approval-prompt defenses crack at Microsoft Copilot and Anthropic Claude
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Microsoft Copilot Cowork Exfiltrates Files simonwillison.net
Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this case Microsoft Copilot Cowork (yes, that’s a real product name ) was allowing agents to send emails to the user’s own inbox without approval… but those messages were then displayed in a way that could leak data to an attacker via rendered images: Because these messages can contain external images that trigger network…
How we contain Claude across products anthropic.com
References
PromptArmor disclosure promptarmor.com
five lines of malicious text in a SKILL.md file produced a 100% success rate (5/5 trials) against Claude Opus 4.7, retrieving pre-authenticated SharePoint/OneDrive links and embedding them in self-addressed Teams messages.
WinBuzzer winbuzzer.com
Microsoft’s documentation claims Cowork ‘asks for your permission before taking sensitive actions,’ but PromptArmor found that messages sent by the agent to the active user bypass approval entirely — an ‘approval gap’ rather than a code bug.
Zenity Labs (EchoLeak post-mortem) labs.zenity.io
EchoLeak is a reminder that AI agent risks are here to stay; blocking specific markdown image paths is whack-a-mole, the underlying LLM Scope Violation pattern persists across vendors.
Checkmarx on CVE-2025-32711 checkmarx.com
EchoLeak (CVSS 9.3) showed Microsoft 365 Copilot leaking data via auto-rendered markdown images months before Cowork; the same exfiltration primitive keeps reappearing because models cannot separate data from instructions.
The Hacker News (ASCII smuggling, Rehberger) thehackernews.com
Microsoft initially declined to treat Rehberger’s ASCII-smuggling Copilot exfiltration as a security issue before silently patching it, illustrating a pattern of classifying prompt-injection exfil as ‘intended behavior.’
Medium critique of Copilot adoption medium.com
Only ~3.3% of commercial M365 seats pay for Copilot and 72% of Fortune 500 deployments remain stuck in pilot phases — Cowork is being shipped into an enterprise base already skeptical of governance and ROI.
SecurityWeek securityweek.com
Anthropic silently patched a Claude Code sandbox bypass in version 2.1.90 without assigning a CVE or documenting the fix in release notes.
Cryptika (on Aonan Guan / Wyze Labs research) cryptika.com
A SOCKS5 hostname null-byte injection let attackers craft a hostname like ‘malicious.com\0.google.com’ that the JavaScript filter approved under the *.google.com allowlist, while the underlying C library truncated at the null byte and connected to the attacker host.
Gray Swan AI grayswan.ai
Claude Opus 4.5 achieved a 4.7% single-attempt ASR but degraded to 33.6% at ten attempts and 63% after 100 attempts in coding environments.
Northflank (Firecracker vs gVisor) northflank.com
Firecracker offers near-native performance with 2–8% overhead and ~5MB memory per microVM, while gVisor imposes a 10–30% ‘syscall tax’ on I/O-heavy workloads and uses 10–50MB per Sentry process.
LayerX Security layerxsecurity.com
ChatGPT Atlas blocked fewer phishing attempts than standard browsers like Chrome or Edge, and its ‘always-on’ authentication model means a session compromise could expose stored tokens across banking and enterprise SaaS.
IANS Research iansresearch.com
Researchers accessed a Mythos staging endpoint on the day of the announcement via simple credential sharing rather than advanced ‘AI wizardry,’ undermining Anthropic’s claims of rigorous internal security.