Sources

How OpenAI delivers low-latency voice AI at scale openai.com

How OpenAI rebuilt its WebRTC stack to power real-time Voice AI with low latency, global scale, and seamless conversational turn-taking.

Redis Array Playground simonwillison.net

Tool: Redis Array Playground Salvatore Sanfilippo submitted a PR adding a new data type - arrays - to Redis. The new commands are ARCOUNT , ARDEL , ARDELRANGE , ARGET , ARGETRANGE , ARGREP , ARINFO , ARINSERT , ARLASTITEMS , ARLEN , ARMGET , ARMSET , ARNEXT , AROP , ARRING , ARSCAN , ARSEEK , ARSET . The implementation is currently available in a branch, so I had Claude Code for web build this interactive playground for trying out the new commands in a WASM-compiled build of a subset of Redis r…

TRE Python binding — ReDoS robustness demo simonwillison.net

Research: TRE Python binding — ReDoS robustness demo If it’s good enough for antirez to add to Redis I figured Ville Laurikari’s TRE regular expression engine was worth exploring in a little more detail. I had Claude Code build an experimental Python binding (it used ctypes ) and try some malicious regular expression attacks against the library. TRE handles those much better than Python’s standard library implementation, thanks mainly to the lack of support for backtracking. Tags: security , py…

Reduce friction and latency for long-running jobs with Webhooks in Gemini API blog.google

Gemini API

Granite 4.1 3B SVG Pelican Gallery simonwillison.net

Simon Willison ran his “pelican riding a bicycle” SVG prompt against all 21 Unsloth GGUF quantizations of IBM’s new Apache 2.0-licensed Granite 4.1 3B model, spanning 1.2GB to 6.34GB. Verdict: no discernible quality-versus-size pattern, and every output is uniformly terrible.

References

AwesomeAgents.ai analysis awesomeagents.ai

Critics on Hacker News point out that the architecture lacks support for trickle ICE and does not provide integrated STUN/TURN servers… users on enterprise networks or public Wi-Fi that block UDP traffic may find the service completely unreachable.

Medium — High Performance Network Programming medium.com

throughput increased from 1.3 Gbit/s on a single core to 7.5 Gbit/s on 10 cores by simply parallelizing data planes with SO_REUSEPORT

OneUptime engineering blog oneuptime.com

If a pinned goroutine blocks frequently on channels or system calls, it can lead to a 10x increase in context switches, as the runtime cannot easily reuse that dedicated thread for other tasks.

Cloudflare blog — Calls Anycast WebRTC blog.cloudflare.com

Cloudflare Calls disrupts this by using Anycast routing… ensures the ‘first hop’ for any user is always the geographically closest data center, reducing initial latency to under 10ms for most of the world’s population.

Reddit r/PromptEngineering — Realtime API field report reddit.com

median turn latency rose from 2.24 seconds in short exchanges to over 5 seconds in calls lasting longer than ten minutes

eesel.ai — Realtime API vs Whisper benchmark eesel.ai

time-to-first-voice (TTFV) typically falls between 450ms and 900ms on stable networks, while time-to-first-byte (TTFB) is often around 250ms to 500ms

MarkTechPost (independent coverage) marktechpost.com

Static webhooks are project-wide and use HMAC-SHA256 with a one-time signing secret; dynamic webhooks override per-request and are verified asymmetrically against Google’s JWKS endpoint.

google-gemini/cookbook Webhooks.ipynb github.com

Requires google-genai>=1.73.1; dynamic webhooks must be verified by extracting the JWT from webhook-signature and validating RS256 against https://generativelanguage.googleapis.com/.well-known/jwks.json, and payloads older than 5 minutes should be rejected.

Nightwatcher AI (Replicate vs Modal comparison) nightwatcherai.com

Replicate offloads webhook reliability to Svix and emits queued/starting/completed events — the async-by-default pattern Google is now catching up to, while Modal still leans on a Python-native sync/async SDK.

The Hacker News (Truffle Security disclosure) thehackernews.com

Enabling Gemini on existing GCP projects silently granted Gemini access to keys Google had previously told developers were ‘not secrets’ — one developer saw bills jump from $180 to over $82,000 in 48 hours.

ai.google.dev Batch/Webhooks docs ai.google.dev

Under load, batch jobs can sit in a PENDING queue for up to 72 hours before expiring — meaning the low-latency webhook only fires once the job actually starts, not when it was submitted.

YouTube walkthrough (Vertex AI eventing) youtube.com

Vertex AI users already wire long-running jobs to Eventarc and Pub/Sub, which can fan out to Cloud Functions or webhooks — a more GCP-native path than the new Gemini API webhooks.

Pasquale Pillitteri — Redis Array technical writeup pasqualepillitteri.it

The new Array uses a three-level sparse representation — a ‘super directory’ of sliced dense directories that point to array slices — so writing index 0 and index 999,999 doesn’t blow up memory, while ARGREP runs server-side regex over RAM-resident data via the vendored TRE library.

Simon Willison — TRE Python binding (ReDoS demo) simonwillison.net

Against a classic ReDoS-vulnerable pattern over 10 million characters, TRE completes in milliseconds with linear scaling, while Python’s stdlib re module hangs effectively indefinitely on the same input.

FailingFast.io — AI coding benchmarks failingfast.io

A 2024 analysis of 211 million lines of code found AI-assisted tools led to a roughly 50% increase in code duplication compared to refactoring, and independent benchmarks indicate up to half of AI-generated code may introduce vulnerabilities.

YouTube — Casey Muratori on AI-assisted coding youtube.com

Muratori compares AI-assisted coding to ‘having someone else play the piano for you’ and argues the cost of babysitting the model against rigorous systems requirements often negates the promised productivity gains.

Sources

References

Jack Sun, writing.