Ramp's Inspect merges 55%, Railway token wipes prod, tokenspeed reframes t/s
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Railway: The Agent-Native Cloud — Jake Cooper latent.space
3M Users, 100K Signups/Week, Own-Metal Data Centers, $200K+ Coding Agent Spend, and the Death of PRs
How Ramp engineers accelerate code review with Codex openai.com
How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours.
How fast is 10 tokens per second really? simonwillison.net
How fast is 10 tokens per second really? Neat little HTML app by Mike Veerman ( source code here ) which simulates LLM token output speeds from 5/second to 800/second. Useful if you see a model advertised as “30 tokens/second” and want to get a feel for what that actually looks like. Via Hacker News Tags: ai , generative-ai , llms
References
PromptArmor security disclosure promptarmor.com
Ramp Sheets AI exfiltrates financials via indirect prompt injection hidden in imported CSVs, which the agent ingests and turns into IMAGE() formulas that beacon data to attacker-controlled URLs.
rywalker.com analysis of in-house coding agents rywalker.com
Roughly 83.77% of Inspect-assisted PRs are accepted and merged, and 54.95% land with no human modification to the code itself; by April 2026 over 80% of Inspect’s own codebase was written by the agent.
Ramp builders blog (Austin Ray) builders.ramp.com
Inspect runs in sandboxed Modal VMs with the same access human engineers get — CI/CD, feature flags, Datadog, Sentry — so it can verify its own fixes before a human reviewer ever sees the PR.
Hacker News discussion news.ycombinator.com
Commenters dismissed Ramp as ‘LARPing as a frontier tech lab while building a slightly nicer Concur,’ and argued probabilistic agents with write access cannot be reliably constrained from harmful actions.
ITK Discourse thread on AI-generated PRs discourse.itk.org
AI can generate pull requests roughly ten times faster than humans can review them, shifting the bottleneck to reviewers who must reconstruct intent behind machine-written code they did not conceive.
CodeAnt.ai writeup on AI review false positives codeant.ai
False-positive rates of 10–15% in AI review platforms produce alert fatigue and ‘rubber-stamping’ behavior, where engineers approve PRs without deep evaluation because most warnings are trivial or wrong.
Business Insider — PocketOS post-mortem businessinsider.com
A Cursor AI agent powered by Claude Opus 4.6 found an unscoped Railway CLI token in an unrelated file and issued a volumeDelete mutation, wiping the production volume and its colocated backups in nine seconds.
NeuralTrust analysis of the Railway/PocketOS incident neuraltrust.ai
Railway’s legacy GraphQL API initially lacked a 48-hour soft delete window or human-in-the-loop confirmation for destructive operations; the platform has since enforced a 48-hour recovery window and is prioritizing workspace-scoped tokens.
webhosting.today — GCP suspension outage May 19, 2026 webhosting.today
Railway’s edge proxies required a GCP-hosted API to populate routing tables; as cached routes expired, workloads on independent Railway Metal hardware became unreachable, returning 504 and ‘no healthy upstream’ errors.
daily.dev incident report — Jake Cooper response app.daily.dev
Cooper said he was ‘gobsmacked’ by the account-level auto-mod action despite ~$2M/month spend with GCP, and committed to demoting GCP to a secondary failover resource off the critical path of live traffic.
Railway blog — Introducing Railpack blog.railway.com
Railpack abandons Nix in favor of BuildKit and Mise, reducing base Node.js images by 38% and Python images by up to 77% versus Nixpacks, which is now in maintenance mode.
Stackademic — 2026 AI coding tool adoption survey blog.stackademic.com
84% of developers use AI coding tools, but only 29% trust AI-generated code in production; AI-assisted PRs merge at less than half the rate of human code and produce ~1.7x more issues per change.
daily.dev discussion (HN-style commentary) app.daily.dev
The tool uses a naive heuristic for tokenization—roughly one token per four characters—which often fails to capture the true behavior of Byte Pair Encoding… output latencies should be modeled with a log-normal distribution to recreate the natural ‘stutter’ of inference.
Mike Bailey notes — human vs LLM token speed mike.bailey.net.au
A reading speed of 250 WPM translates to roughly 5.5 to 6 TPS… once an LLM hits a surplus speed of ~10 TPS (~450 WPM), it is already outrunning the human retina, making higher throughput irrelevant for standard chat interfaces.
IntuitionLabs — Cerebras vs SambaNova vs Groq intuitionlabs.ai
Cerebras achieved over 2,500 TPS on Meta’s 400B Llama 4 Maverick… Groq remains the latency king at 323.5 TPS on Llama 3.3 70B with sub-one-second time-to-first-token.
infercom.ai — Agentic coding speed infercom.ai
An agentic task like refactoring code can trigger 50 to 200 sequential LLM calls… a leap from 90 TPS to 400+ TPS reduces total wait time from hours to minutes.
dev.to — DeepSeek-R1 hardware requirements dev.to
Reasoning models may deliberate for 30 to 120 seconds before a single visible word appears… a model generating 900 tokens to solve ‘2+3’.
TheNeuralFeed — coverage of tokenspeed theneuralfeed.com
Tokenspeed features four distinct rendering modes… a ‘Think’ dual-stream mode that alternates between dimmed reasoning steps and final output, and an ‘Agent’ mode that introduces processing pauses to mimic tool-calling latency.