Sources

Behind the Scenes Hardening Firefox with Claude Mythos Preview simonwillison.net

Behind the Scenes Hardening Firefox with Claude Mythos Preview Fascinating, in-depth details on how Mozilla used their access to the Claude Mythos preview to locate and then fix hundreds of vulnerabilities in Firefox: Suddenly, the bugs are very good Just a few months ago, AI-generated security bug reports to open source projects were mostly known for being unwanted slop. Dealing with reports that look plausibly correct but are wrong imposes an asymmetric cost on project maintainers: it’s cheap…

llm-gemini 0.31 simonwillison.net

Release: llm-gemini 0.31 gemini-3.1-flash-lite is no longer a preview . Here’s my write-up of the Gemini 3.1 Flash-Lite Preview model back in March. I don’t believe this new non-preview model has changed since then. Tags: llm-release , gemini , llm , google , generative-ai , ai , llms

GitHub Repo Stats simonwillison.net

Tool: GitHub Repo Stats One of the things I always look for when evaluating a new GitHub repository is the number of commits it has… but that number isn’t visible on GitHub’s mobile site layout. I built this tool to fix that, using this prompt: Given a GitHub repo URL or foo/bar repo ID show information about that repo absorbed via wither REST or graphql CORS fetch() including the number of commits in the repo and other useful stats Example output for simonw/datasette and simonw/llm . Tags: g…

Big Words simonwillison.net

Simon Willison vibe-coded a single-page web tool that turns URL query parameters into text slides, built to feed his URL-only macOS presentation tool. Options include gradient backgrounds, font weight, italics, uppercase, and drop shadow, with a settings panel revealed by double-clicking the page.

References

Medium post by independent researcher Gregory Hopper (via search synthesis) vardhmanandroid2015.medium.com

For some ‘trophy’ exploits, Anthropic intentionally disabled Firefox’s multi-layered defense-in-depth mitigations—specifically the process sandbox… 181 of the reported Firefox exploits were achieved only by disabling the browser’s sandbox.

Cyberpress — CVE rollup breakdown cyberpress.org

Mozilla grouped most of these into three major internal rollups: CVE-2026-6784 (154 bugs), CVE-2026-6785 (55 bugs), CVE-2026-6786 (107 bugs), plus three standalone CVEs for Anthropic Frontier Red Team findings.

MLQ.ai — Google Big Sleep comparison mlq.ai

Google’s Big Sleep focuses on ‘variant analysis’ and high-precision real-world zero-days like the SQLite stack buffer underflow (CVE-2025-6965); Mozilla’s Mythos approach instead demonstrates higher volume of discoveries across a massive legacy codebase.

Alan Turing Institute / CETaS report on Claude Mythos cetas.turing.ac.uk

AISI confirmed Mythos was the first model to complete ‘The Last Ones,’ a 32-step end-to-end cyber-attack simulation… but noted these successes occurred in environments lacking active human defenders.

Security Boulevard — Mythos unauthorized access securityboulevard.com

Unauthorized users—primarily hobbyists from a private Discord—gained access to Claude Mythos Preview via shared credentials from Mercor, an AI staffing contractor, on the same day it was announced for limited corporate testing.

StartupFortune — Stenberg/curl context startupfortune.com

Daniel Stenberg ended curl’s HackerOne bug bounty in January 2026 after AI ‘slop’ reports collapsed signal-to-noise from ~15% valid to under 5%; Mythos arrived months later showing the same technology that DDoS’d maintainers can also find genuine 27-year-old OpenBSD bugs.

r/Bard thread (Reddit) reddit.com

Gemini 3.1 Flash Lite Preview is the dumbest Google model released so far… heavily fine-tuned to perform well at benchmarks but fails to maintain coherence in real world tasks.

zerotwo.ai review zerotwo.ai

Output costs jumped from $0.40 to $1.50 per million tokens — a nearly four-fold increase versus 2.5 Flash-Lite, signalling the end of the budget-LLM era for high-volume pipelines.

Google Cloud blog (GA announcement) cloud.google.com

2.5x faster Time-to-First-Token and 45% faster output speed than Gemini 2.5 Flash, with an Arena Elo of 1432 and 86.9% on GPQA Diamond.

intuitionlabs low-cost LLM comparison intuitionlabs.ai

GPT-5 nano at $0.05/$0.40 per million tokens undercuts Gemini 3.1 Flash-Lite by ~80%, though it scores only ~42.8% on GPQA versus Flash-Lite’s 86.9%.

automateed.com independent review automateed.com

97% structured-output compliance and 389 tokens/sec throughput, but output pricing sits ~65% above the tier median, hurting output-heavy workloads.

Gemini API changelog (ai.google.dev) ai.google.dev

gemini-3.1-flash-lite-preview is scheduled for deprecation on May 11, 2026 with final shutdown May 25, 2026; Vertex AI SDK support for Gemini ends after June 2026.

lfe.io docs — GitHub commit-count workarounds docs2.lfe.io

The GraphQL API exposes a totalCount field on the history object… preferred over REST for efficiency because it returns the scalar count in a single request without the overhead of paginating through actual commit nodes.

GitHub Community feature request (gist mirror) gist.github.com

Feature requests in the GitHub Community have called for ‘parity with the web app,’ explicitly asking for the repository commit count to be restored to the mobile view to match the desktop experience.

Simon Willison — ‘Vibe engineering’ (Oct 2025) simonwillison.net

Willison has proposed the term ‘vibe engineering’ to describe a more disciplined workflow… using AI agents to handle the bulk of the implementation while the human developer focuses on rigorous manual testing and automated verification.

The New Stack — ‘Vibe coding could cause catastrophic explosions in 2026’ thenewstack.io

Willison has famously predicted a ‘Challenger disaster’ for AI-assisted coding, warning that running autonomous coding agents with high-level system permissions could eventually lead to catastrophic data loss or security breaches.

Cyberpress — CursorJacking PAT exfiltration cyberpress.org

Some AI editors have been found storing PATs and session tokens in unencrypted local SQLite databases… any installed third-party extension—regardless of its declared permissions—can query the database to exfiltrate plaintext credentials.

r/git — commit history visualization tools thread reddit.com

Local-first tools… bypass the GitHub API’s rate limits and eliminate the need to grant third-party applications read access to private codebases.

Sources

References

Jack Sun, writing.