Sources

Railway: The Agent-Native Cloud — Jake Cooper latent.space

3M Users, 100K Signups/Week, Own-Metal Data Centers, $200K+ Coding Agent Spend, and the Death of PRs

How Ramp engineers accelerate code review with Codex openai.com

How Ramp engineers use Codex with GPT-5.5 to review code and ship improvements, allowing them to get substantive feedback in minutes instead of hours.

How fast is 10 tokens per second really? simonwillison.net

How fast is 10 tokens per second really? Neat little HTML app by Mike Veerman ( source code here ) which simulates LLM token output speeds from 5/second to 800/second. Useful if you see a model advertised as “30 tokens/second” and want to get a feel for what that actually looks like. Via Hacker News Tags: ai , generative-ai , llms

References

PromptArmor security disclosure promptarmor.com

Ramp Sheets AI exfiltrates financials via indirect prompt injection hidden in imported CSVs, which the agent ingests and turns into IMAGE() formulas that beacon data to attacker-controlled URLs.

rywalker.com analysis of in-house coding agents rywalker.com

Roughly 83.77% of Inspect-assisted PRs are accepted and merged, and 54.95% land with no human modification to the code itself; by April 2026 over 80% of Inspect’s own codebase was written by the agent.

Ramp builders blog (Austin Ray) builders.ramp.com

Inspect runs in sandboxed Modal VMs with the same access human engineers get — CI/CD, feature flags, Datadog, Sentry — so it can verify its own fixes before a human reviewer ever sees the PR.

Hacker News discussion news.ycombinator.com

Commenters dismissed Ramp as ‘LARPing as a frontier tech lab while building a slightly nicer Concur,’ and argued probabilistic agents with write access cannot be reliably constrained from harmful actions.

ITK Discourse thread on AI-generated PRs discourse.itk.org

AI can generate pull requests roughly ten times faster than humans can review them, shifting the bottleneck to reviewers who must reconstruct intent behind machine-written code they did not conceive.

CodeAnt.ai writeup on AI review false positives codeant.ai

False-positive rates of 10–15% in AI review platforms produce alert fatigue and ‘rubber-stamping’ behavior, where engineers approve PRs without deep evaluation because most warnings are trivial or wrong.

Business Insider — PocketOS post-mortem businessinsider.com

A Cursor AI agent powered by Claude Opus 4.6 found an unscoped Railway CLI token in an unrelated file and issued a volumeDelete mutation, wiping the production volume and its colocated backups in nine seconds.

NeuralTrust analysis of the Railway/PocketOS incident neuraltrust.ai

Railway’s legacy GraphQL API initially lacked a 48-hour soft delete window or human-in-the-loop confirmation for destructive operations; the platform has since enforced a 48-hour recovery window and is prioritizing workspace-scoped tokens.

webhosting.today — GCP suspension outage May 19, 2026 webhosting.today

Railway’s edge proxies required a GCP-hosted API to populate routing tables; as cached routes expired, workloads on independent Railway Metal hardware became unreachable, returning 504 and ‘no healthy upstream’ errors.

daily.dev incident report — Jake Cooper response app.daily.dev

Cooper said he was ‘gobsmacked’ by the account-level auto-mod action despite ~$2M/month spend with GCP, and committed to demoting GCP to a secondary failover resource off the critical path of live traffic.

Railway blog — Introducing Railpack blog.railway.com

Railpack abandons Nix in favor of BuildKit and Mise, reducing base Node.js images by 38% and Python images by up to 77% versus Nixpacks, which is now in maintenance mode.

Stackademic — 2026 AI coding tool adoption survey blog.stackademic.com

84% of developers use AI coding tools, but only 29% trust AI-generated code in production; AI-assisted PRs merge at less than half the rate of human code and produce ~1.7x more issues per change.

daily.dev discussion (HN-style commentary) app.daily.dev

The tool uses a naive heuristic for tokenization—roughly one token per four characters—which often fails to capture the true behavior of Byte Pair Encoding… output latencies should be modeled with a log-normal distribution to recreate the natural ‘stutter’ of inference.

Mike Bailey notes — human vs LLM token speed mike.bailey.net.au

A reading speed of 250 WPM translates to roughly 5.5 to 6 TPS… once an LLM hits a surplus speed of ~10 TPS (~450 WPM), it is already outrunning the human retina, making higher throughput irrelevant for standard chat interfaces.

IntuitionLabs — Cerebras vs SambaNova vs Groq intuitionlabs.ai

Cerebras achieved over 2,500 TPS on Meta’s 400B Llama 4 Maverick… Groq remains the latency king at 323.5 TPS on Llama 3.3 70B with sub-one-second time-to-first-token.

infercom.ai — Agentic coding speed infercom.ai

An agentic task like refactoring code can trigger 50 to 200 sequential LLM calls… a leap from 90 TPS to 400+ TPS reduces total wait time from hours to minutes.

dev.to — DeepSeek-R1 hardware requirements dev.to

Reasoning models may deliberate for 30 to 120 seconds before a single visible word appears… a model generating 900 tokens to solve ‘2+3’.

TheNeuralFeed — coverage of tokenspeed theneuralfeed.com

Tokenspeed features four distinct rendering modes… a ‘Think’ dual-stream mode that alternates between dimmed reasoning steps and final output, and an ‘Agent’ mode that introduces processing pauses to mimic tool-calling latency.

Sources

References

Jack Sun, writing.