Sources

Microsoft Copilot Cowork Exfiltrates Files simonwillison.net

Microsoft Copilot Cowork Exfiltrates Files The biggest challenge in designing agentic systems continues to be preventing them from enabling attackers to exfiltrate data. In this case Microsoft Copilot Cowork (yes, that’s a real product name ) was allowing agents to send emails to the user’s own inbox without approval… but those messages were then displayed in a way that could leak data to an attacker via rendered images: Because these messages can contain external images that trigger network…

How we contain Claude across products anthropic.com

References

PromptArmor disclosure promptarmor.com

five lines of malicious text in a SKILL.md file produced a 100% success rate (5/5 trials) against Claude Opus 4.7, retrieving pre-authenticated SharePoint/OneDrive links and embedding them in self-addressed Teams messages.

WinBuzzer winbuzzer.com

Microsoft’s documentation claims Cowork ‘asks for your permission before taking sensitive actions,’ but PromptArmor found that messages sent by the agent to the active user bypass approval entirely — an ‘approval gap’ rather than a code bug.

Zenity Labs (EchoLeak post-mortem) labs.zenity.io

EchoLeak is a reminder that AI agent risks are here to stay; blocking specific markdown image paths is whack-a-mole, the underlying LLM Scope Violation pattern persists across vendors.

Checkmarx on CVE-2025-32711 checkmarx.com

EchoLeak (CVSS 9.3) showed Microsoft 365 Copilot leaking data via auto-rendered markdown images months before Cowork; the same exfiltration primitive keeps reappearing because models cannot separate data from instructions.

The Hacker News (ASCII smuggling, Rehberger) thehackernews.com

Microsoft initially declined to treat Rehberger’s ASCII-smuggling Copilot exfiltration as a security issue before silently patching it, illustrating a pattern of classifying prompt-injection exfil as ‘intended behavior.’

Medium critique of Copilot adoption medium.com

Only ~3.3% of commercial M365 seats pay for Copilot and 72% of Fortune 500 deployments remain stuck in pilot phases — Cowork is being shipped into an enterprise base already skeptical of governance and ROI.

SecurityWeek securityweek.com

Anthropic silently patched a Claude Code sandbox bypass in version 2.1.90 without assigning a CVE or documenting the fix in release notes.

Cryptika (on Aonan Guan / Wyze Labs research) cryptika.com

A SOCKS5 hostname null-byte injection let attackers craft a hostname like ‘malicious.com\0.google.com’ that the JavaScript filter approved under the *.google.com allowlist, while the underlying C library truncated at the null byte and connected to the attacker host.

Gray Swan AI grayswan.ai

Claude Opus 4.5 achieved a 4.7% single-attempt ASR but degraded to 33.6% at ten attempts and 63% after 100 attempts in coding environments.

Northflank (Firecracker vs gVisor) northflank.com

Firecracker offers near-native performance with 2–8% overhead and ~5MB memory per microVM, while gVisor imposes a 10–30% ‘syscall tax’ on I/O-heavy workloads and uses 10–50MB per Sentry process.

LayerX Security layerxsecurity.com

ChatGPT Atlas blocked fewer phishing attempts than standard browsers like Chrome or Edge, and its ‘always-on’ authentication model means a session compromise could expose stored tokens across banking and enterprise SaaS.

IANS Research iansresearch.com

Researchers accessed a Mythos staging endpoint on the day of the announcement via simple credential sharing rather than advanced ‘AI wizardry,’ undermining Anthropic’s claims of rigorous internal security.

Sources

References

Jack Sun, writing.