OpenAI donates MRC, Google strips Gemma 4 heads, ServiceNow rolls back vLLM V1
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Unlocking large scale AI training networks with MRC (Multipath Reliable Connection) openai.com
OpenAI introduces MRC (Multipath Reliable Connection), a new supercomputer networking protocol released via OCP to improve resilience and performance in large-scale AI training clusters.
vLLM V0 to V1: Correctness Before Corrections in RL huggingface.co
Google’s Gemma 4 AI models get 3x speed boost by predicting future tokens arstechnica.com
Up to 3x the speed with no loss of quality—is it too good to be true?
Vibe coding and agentic engineering are getting closer than I’d like simonwillison.net
On Heavybit’s High Leverage podcast, Simon Willison admits the line he drew between throwaway ‘vibe coding’ and disciplined ‘agentic engineering’ is dissolving in his own workflow: he no longer reviews every line Claude Code writes for production, treating the agent like a black-box internal team whose output he trusts until it breaks.
References
ServeTheHome (Patrick Kennedy) servethehome.com
MRC is the custom RDMA transport protocol for gigascale AI… NVIDIA does not see the industry collapsing onto a single UEC winner; instead hyperscalers will tune variants like MRC to their specific workloads.
Dell’Oro Group analysis delloro.com
OpenAI’s MRC initiative reinforces Ethernet’s expanding role in AI back-end networks, where Ethernet sales surpassed InfiniBand in 2025.
Ernest Chiang notes on AWS SRD ernestchiang.com
SRD sprays packets across up to 64 parallel paths via ECMP and delegates reordering to Nitro hardware—an approach AWS shipped years before MRC formalized similar ideas.
Midokura: UET vs Falcon and Beyond midokura.com
MRC is a production-hardened extension of RoCEv2 incorporating early UEC ideas, while UEC v1.0.2 (Jan 2026) remains the broader multivendor framework—the two are complementary, not rival, transports.
Data Center Knowledge datacenterknowledge.com
Although contributed to OCP, MRC’s advanced features—packet trimming, SRv6 source routing, out-of-order DMA—require high-end programmable NICs like Broadcom Thor Ultra or AMD Pensando Pollara, limiting near-term adoption to hyperscalers.
Bizety: AI Network Wars bizety.com
Critics argue source routing via SRv6 adds significant header overhead and shifts complexity to the NIC, potentially creating a new form of silicon lock-in where only specific vendors can support the full stack efficiently.
Thoughtworks/HuggingFace blog (EAGLE3 for Gemma 4) huggingface.co
EAGLE3 draft head for Gemma 4 31B… achieved a 1.72x speedup on the MT-Bench conversational benchmark… speed gains dropped to 1.05–1.14x on the SWEBench coding benchmark, as code is inherently less predictable than natural language.
r/LocalLLaMA benchmark thread reddit.com
29% average speedup (peaking at 50% for coding tasks) when using the E2B (4.65B) model to draft for the 31B target
Google AI Edge MTP overview (ai.google.dev) ai.google.dev
Centroids Masking groups the 262K-token vocabulary into ~4K clusters… reduces linear head computation by roughly 45x with negligible impact on accuracy
Maarten Grootendorst, ‘A Visual Guide to Gemma 4’ newsletter.maartengrootendorst.com
the native MTP heads were initially omitted from the Hugging Face release, remaining exclusive to Google’s LiteRT framework
r/LocalLLaMA — ‘Speculative decoding silently broken for Qwen 3.6’ reddit.com
metadata mismatches in early GGUF files (such as conflicting add_bos_token flags) could force inference engines into ‘token translation mode,’ effectively nullifying all speed gains
Medium — ‘Understanding MTP in DeepSeek-V3’ medium.com
DeepSeek explicitly credits Meta’s work (Gloeckle et al., 2024) in its technical reports as the foundational inspiration… DeepSeek-V3 uses auxiliary one-layer transformer modules that predict tokens in a causal chain
ScaleRL paper (arxiv 2510.13786) arxiv.org
Including the FP32 precision fix at the LM head improved asymptotic reward from 0.52 to 0.61, establishing it as a de-facto standard for stable large-scale RL training.
MiniMax-M1 technical report (arxiv 2506.13585) arxiv.org
Upgrading the LM head to FP32 precision raised correlation between training and rollout token probabilities from ~0.9x to ~0.99x, eliminating reward plateaus in long-CoT RL.
TRL GitHub issue #4159 github.com
TRL’s GRPOTrainer added logprobs_mode=“processed_logprobs” for vLLM ≥0.10.2; without it, raw logprobs combined with non-unit temperature produce a train-inference mismatch that destabilizes policy ratios.
0fd.org — ‘When Speed Kills Stability’ 0fd.org
Mitigations such as Masked Importance Sampling and dynamic LR scheduling can add up to 25% computational overhead, prompting debate over whether to prioritize raw inference speed or bitwise consistency.
EmergentMind — CISPO algorithm summary emergentmind.com
CISPO clips only the magnitude of the importance sampling weight while retaining the underlying gradient update, so rare ‘reflective’ tokens (e.g. ‘Wait’, ‘However’) still contribute — reaching target performance in roughly half the steps of GRPO.
Andrey Lukyanenko — GSPO paper review andlukyane.com
GSPO clips significantly more tokens than GRPO yet yields more effective gradients, and is naturally tolerant of the precision discrepancies and routing fluctuations inherent in MoE models — eliminating engineering hacks like Routing Replay.