GM cuts crash sims to 60s, IBM scaffolds enterprise LLMs, Google answers MCP
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
From 15 hours to one minute: How AI/ML is speeding up GM’s development arstechnica.com
From CFD and FEA to digital twins, carmaking now involves a lot of virtualization.
Beyond LLMs: Why Scalable Enterprise AI Adoption Depends on Agent Logic huggingface.co
How we used Gemini to build Google I/O 2026 blog.google
A collage of I/O-related images, including the Antigravity Coffee Co. pop-up, a colorful jellyfish and a still from the Timmy TPU video. The word AI repeats three times on the left of the image, and there are colorful icons, including a sparkle, as well.
Pasted File Editor simonwillison.net
Willison prototyped a browser-based Pasted File Editor that mimics Claude’s trick of converting large text pastes into file attachments. Codex desktop built the tool, which also accepts drag-and-drop files and renders pasted images as thumbnails inside the textarea.
References
SAFER Research (Chalmers University benchmark) saferresearch.com
ML models could predict head node displacement, rib strain, and seatbelt forces with high accuracy… however, metrics such as head node acceleration—especially during the peak ‘in-crash’ phase—exhibit much lower accuracy
NVIDIA PhysicsNeMo crash simulation docs docs.nvidia.com
GM is testing architectures like MeshGraphNet and Transolver to predict structural deformation in ‘Body-in-White’ crash scenarios without the massive computational overhead of finite element analysis
Quantum Zeitgeist on PhysicsNeMo automotive crash framework quantumzeitgeist.com
AI-accelerated FEA… surrogate models can reduce individual design evaluations from minutes or hours to milliseconds, but training a reliable surrogate typically requires hundreds or thousands of high-fidelity FEA runs
Electrek electrek.co
GM considers former Tesla Autopilot head Sterling Anderson as next CEO… increasingly viewed as a potential successor to CEO Mary Barra, as he leads GM’s pivot away from the robotaxi-centric strategy of its former Cruise unit
GM press release (May 2025) news.gm.com
GM names Sterling Anderson chief product officer… recruited with a $40 million compensation package to oversee the end-to-end lifecycle of its entire portfolio, including gas and electric vehicles
TopSpeed on Toyota Arene OS topspeed.com
Toyota lengthens generational cycles to nearly a decade… debuting in the 2026 RAV4, Arene OS enables continuous over-the-air updates, allowing Toyota to extend vehicle lifecycles by keeping features ‘modern’ via software rather than mechanical refreshes
IBM Research / UC Berkeley on Hugging Face (ITBench + MAST) huggingface.co
The strongest predictor of failure is ‘Incorrect Verification’ — agents ‘declare victory’ without checking ground truth evidence, such as whether an alert actually cleared.
OpenReview: ITBench-AA independent evaluation openreview.net
Frontier models like Claude 4.7 and GPT-5.5 scored below 50% accuracy on Kubernetes incident root-cause analysis; IBM’s own baseline agents resolved only 11.4% of SRE scenarios and 0% of initial FinOps tasks.
ResearchGate: ‘From Benchmarks to Business Impact — Deploying IBM Generalist Agent’ researchgate.net
CUGA achieves near-perfect scores on linear Level 1 tasks, [but] success rates dip to approximately 38% on Level 3 tasks involving complex branching and unexpected tool outputs.
dev.to practitioner write-up on CUGA dev.to
Some early adopters describe the setup process as ‘cumbersome’ compared to more streamlined frameworks like CrewAI or LangChain… it requires significant domain-specific configuration—such as defining precise ‘Playbooks’—to achieve its reported enterprise gains.
Futurum Group: ‘Agentic AI — Leading Vendors Winning the Enterprise in 2026’ futurumgroup.com
Independent experts note a ‘37% gap’ between lab benchmarks and real-world production performance… platforms often face 50x cost variations in production due to inefficient token usage during multi-agent loops.
CrewAI blog: ‘You’re Building Agent Security in the Wrong Order’ blog.crewai.com
A critical ‘sequencing problem’ exists: companies often build security before ensuring their agents work… over-instrumentation can kill throughput.
a2ui.sh — A2UI vs MCP Apps a2ui.sh
A2UI utilizes a ‘native-first’ approach where agents transmit declarative JSON blueprints rather than executable code… it only allows the agent to request components from a predefined client whitelist, systemically preventing code injection attacks.
sunpeak.ai — MCP Apps vs A2UI sunpeak.ai
MCP Apps are typically bundled web widgets that run within sandboxed iframes, giving the server total control over the visual experience at the cost of potential ‘visual breaks’ from the host app’s design system.
Milvus — Nano Banana known limitations milvus.io
Independent reviews highlight that the model often struggles with fine-grained edits in busy photos… facial expressions or brand logos can subtly shift after several rounds of refinement, making it less reliable for strict brand-compliant workflows.
sanj.dev — Antigravity quota problems sanj.dev
Developers reported hitting a ‘7-day organizational lockout’ after only a few hours of intensive work, despite advertised rolling 5-hour refresh windows… a single complex request could silently exhaust a week’s worth of tokens.
pasqualepillitteri.it — Coralboard 1 TOPS edge AI pasqualepillitteri.it
Unlike previous edge accelerators that were limited to convolutional neural networks (CNNs), the Coral NPU on this board supports transformer-based workloads from the silicon level up, allowing for more sophisticated multimodal logic.
openlm.ai — Chatbot Arena leaderboard openlm.ai
Gemini 3.5 Flash (Elo ~1504) in a near-dead heat with GPT-5.5 and Claude 4.7… Flash still lags behind 3.1 Pro in deep reasoning (ARC-AGI-2) and long-document needle-in-a-haystack retrieval.