OpenAI pilots stall 88%, xAI sued over Memphis turbines, Google holds Omni edit
Three frontier AI moves each raced past the gate meant to hold them: enterprise adoption, Memphis air permits, and Omni safeguards.
OpenAI pilots stall 88%, xAI sued over Memphis turbines, Google holds Omni edit
TL;DR
- 88% of enterprise agent pilots never reach production, burning ~$340K per abandoned project.
- Earthjustice, SELC, and NAACP sued xAI over 46 unpermitted gas turbines at the Memphis Colossus site.
- Google deferred Gemini Omni’s speech-editing mode that rewrites what a real person says on camera.
- Enkrypt red-teamers bypassed Omni safeguards on CBRN-adjacent prompts at a 52% success rate.
- Hyperscaler gas projects rose from 11% of new builds in 2024 to 18% in 2026.
Today’s three lead stories sit at very different layers of the AI stack — enterprise deployment, energy infrastructure, generative media — but each is the same shape underneath. A frontier player shipped past the gate that was supposed to hold it, and the gap showed up on the same day as the headline. OpenAI, Anthropic, and Google are pushing three incompatible agent bets into a market where 88% of pilots never reach production. xAI stood up 46 gas turbines in Memphis without the permits, and is now in court over it. Google shipped Gemini Omni but quietly held back the speech-editing mode and watched red-teamers clear its safeguards on CBRN prompts more than half the time.
The through-line isn’t caution — none of these companies slowed down. It’s that adoption, legal, and safety gates are now visibly the binding constraint, and each of today’s announcements arrived with the receipt for which one bent first.
OpenAI, Anthropic, Google ship agents into 88% pilot failure
Source: latent-space · published 2026-05-23
TL;DR
- 88% of enterprise agent pilots never reach production, burning ~$340K per abandoned project before cancellation.
- OpenAI, Anthropic, and Google are making three incompatible bets: vertical stack, open protocol, data gravity.
- METR clocked coding-agent users 19% slower even as developers reported feeling 20% faster.
- τ-bench success sits at 13–25% for frontier reasoning models on realistic tool-use trajectories.
Three labs, three bets
Latent Space’s “all model labs are now agent labs” is directionally right but flattens what is actually a three-way strategic split. Independent mapping of the frontier shows each lab is shipping a fundamentally different harness 1:
| Lab | Bet | Mechanism |
|---|---|---|
| OpenAI | Vertical integration | Codex + Operator as a closed appliance |
| Anthropic | Open protocol | MCP as substrate, Computer Use, Stainless |
| Data gravity | Gemini agents with native Workspace/Android history |
Pick a stack in mid-2026 and you are not choosing a model — you are choosing whether your agents live inside someone else’s product, sit on a portable interface, or feed on the email and docs you already gave Google. Andrej Karpathy’s confession that he has largely stopped writing code and now spends 16-hour days “communicating intent” to parallel agents — bottlenecked on token throughput, not keystrokes — is the cultural tell that the harness, not the weights, is the artifact labs now compete on 2.
The 88% problem
The supply-side narrative collides with hard deployment data. Roughly 88% of enterprise AI-agent projects fail to make it from pilot to production, with each abandoned attempt costing about $340K in direct spend before someone pulls the plug 3. That is not a failure of model capability so much as a failure of the surrounding loop: review, integration, error recovery, the unglamorous work the demos skip.
METR’s controlled study on coding agents made the gap legible. Experienced developers using agents perceived a 20% speedup. Stopwatch said they were 19% slower, because catching subtle, cascading errors ate the time the agent saved 4. The productivity paradox is not “agents don’t work” — it is “agents shift work from writing to reviewing, and reviewing is harder to amortize.”
Benchmark inflation makes the picture worse. A 2026 audit found a 37-point gap between lab scores and real-world performance, and on τ-bench — designed to measure realistic tool-use trajectories — even high-reasoning frontier models manage only 13–25% success 5. The numbers labs cite in launch posts are not the numbers buyers reproduce in production.
Capability is outrunning the humans behind it
The symmetric risk to the productivity paradox is what Anthropic’s Project Glasswing exposed in security: pentest agents surface dramatically more vulnerabilities than human red teams, but defenders cannot triage the “discovery flood” fast enough to act on it 6. Capability scales; the human loop required to verify it does not.
Agents generate more work product than the surrounding humans can verify, so net throughput stalls or regresses.
That is the through-line connecting METR’s slower coders, the 88% pilot graveyard, and Glasswing’s un-triaged CVE pile. Every lab pivoting to agents is, in effect, racing to flood the same bottleneck.
The takeaway
The editorial gets the supply side right: there are no model labs left, only agent labs. What it underweights is the demand side. Buyers, reviewers, and security teams are the binding constraint now, and none of the three lab strategies in 1 has a credible answer for it yet. The next interesting frontier number is not a benchmark — it is the first lab to move the 88%.
Musk’s xAI sued over 46 unpermitted Memphis gas turbines
Source: techcrunch-ai · published 2026-05-23
TL;DR
- Earthjustice, SELC and NAACP sued xAI over 46 unpermitted gas turbines at its Memphis Colossus site.
- Tesla set a 100 GW/year solar manufacturing target in 2026, a ~300x jump backed by $2.9B from Suzhou Maxwell.
- Orbital data centers model at $891/MWh vs $398/MWh terrestrial, with vacuum cooling capping heat rejection near 633 W/m².
- Hyperscaler gas projects jumped from 11% (2024) to 18% (2026), with Meta signing a 20-year deal.
The gas pivot is now a federal pollution case
TechCrunch frames xAI’s gas-turbine buildout in Memphis as a strategy shift. The legal record reframes it as an active environmental-justice fight. Earthjustice and the Southern Environmental Law Center sued in April 2026, alleging xAI operated at least 27 — later 46 — unpermitted turbines at Colossus and a sister site in Southaven, “effectively building a major power plant” without preconstruction or operating permits required by federal law 7. The EPA had closed the mobile/nonroad loophole xAI relied on; Mississippi regulators were accused of rubber-stamping a permanent permit three weeks after public comment closed.
The siting matters as much as the permits. The NAACP’s notice of intent to sue points out that the turbines sit beside historically Black neighborhoods already facing cancer risk four times the national average 8. This isn’t a footnote on Musk’s energy preferences — it is the most consequential pollution case attached to any frontier-AI buildout.
Orbital solar: the arithmetic doesn’t work yet
The “obsessed with orbital data centers” line in the source treats SpaceX’s pitch as visionary. Independent modeling treats it as arithmetic that currently fails. A 2026 cost study puts a 1 GW orbital deployment at $31.2B against $14.8B on the ground, with levelized cost of $891/MWh versus $398/MWh terrestrial 9.
The physics critique cuts deeper. In vacuum there is no convection; net radiative heat rejection caps near ~633 W/m², so the radiators needed for GPU-class power densities can erase Starship’s mass-cost advantage 10. Google’s own “Suncatcher” analysis pegs break-even at roughly $200/kg launch cost — well below Starship’s optimistic $100/kg target, but contingent on launch cadence SpaceX hasn’t demonstrated. Musk’s “5x more effective in orbit” claim ignores the rejection-side ceiling.
Tesla didn’t give up on solar — it gigafactory’d it
The headline’s strongest claim is also its weakest. Tesla announced a 100 GW/year U.S. solar manufacturing target in early 2026 — close to a 300x increase over prior internal capacity — backed by the Suzhou Maxwell deal and a Buffalo conversion to a new TSP-420 panel line 11. Megapack and Powerwall storage hit 46.7 GWh and $12.8B in revenue at margins nearly double the automotive business. The shift is from rooftop/residential to utility-scale panels plus batteries, not from solar as a category.
xAI is fast and lawless, not unusual on fuel
The Musk-as-outlier framing also overstates the contrast with hyperscalers. Meta signed a 20-year natural gas deal for Louisiana and Tennessee campuses; the share of planned natural gas projects for U.S. data centers rose from 11% in 2024 to over 18% in 2026, with gas interconnection costs near $24/kW versus roughly $253/kW for solar 12. Microsoft’s Three Mile Island restart and Google’s Kairos SMR deal won’t deliver power before 2028–2030.
xAI’s distinguishing features are mobile turbines, permit evasion, and speed. Those are the parts the regulators — and now the courts — are pricing in.
Google ships Gemini Omni but withholds speech editing
Source: the-verge-ai · published 2026-05-23
TL;DR
- Google deferred Gemini Omni’s speech-and-audio editing — the mode that rewrites what a real person says on camera 13.
- Enkrypt AI red-teamers bypassed Omni’s safeguards on CBRN-adjacent prompts at a 52% success rate by combining text and images 14.
- SynthID watermarking only flags Google-generated media, not output from Midjourney or OpenAI 15.
- Reviewers hit a “four-turn ceiling” where scene memory degrades and backgrounds warp by the fifth edit 16.
The capability that didn’t ship
The Verge’s hands-on with Gemini Omni — deepfaking a kid’s plush deer onto vacation — reads as awe tempered by unease. The unease is load-bearing. Google quietly held back the model’s most dangerous mode: TechTimes confirms broader speech-and-audio editing, the capability that would let a user rewrite what a real person says in a real video, is deferred pending further safety testing 13. That’s the editorial story. Google built an any-to-any model and then drew a line through the column that turns it into a political weapon.
The line is real, but it’s also the only crisp guardrail in the stack. Everything else Google points to as safety scaffolding looks softer under scrutiny.
The scaffolding under stress
Start with SynthID. Google markets the invisible watermark as the provenance answer for AI-generated media. BuildFastWithAI’s review notes the obvious limit: SynthID detects only Google-generated content and is blind to output from Midjourney, OpenAI, or any open-weights model 15. As an industry provenance layer, it’s a Google-only stamp on a multi-vendor problem — useful for Google’s own moderation pipeline, close to useless for a journalist trying to verify a clip in the wild.
Then the jailbreaks. Enkrypt AI’s multimodal red-teaming study reports vision-based attacks hitting a 52% success rate at eliciting prohibited CBRN-adjacent content when text and images are combined 14. That’s not a fringe edge case — it’s direct evidence that fusing modalities into a single model creates new attack surface, not just new creative surface. The unified architecture that makes Omni feel magical is the same architecture that lets a benign-looking image smuggle intent past a text-only filter.
And on the child-safety flank, the Fairplay coalition has petitioned the FTC to investigate Google’s Gemini rollout to under-13s 17. The Verge author’s choice not to show his son the deepfaked-Buddy clips reads, in that light, less as personal squeamishness than as the same instinct Fairplay is trying to convert into a regulatory ask.
What practitioners actually hit
The capability gap matters too. JXP’s review documents a “four-turn ceiling” — by the fifth or sixth conversational edit, the model’s memory of the original scene deteriorates and background elements warp or vanish 16. Their companion explainer flags infrastructure trouble: a “2026 stability crisis” of infinite loading loops and spurious quota errors that the polished I/O demo did not advertise 18. The Verge’s own anecdote — Omni repeatedly adding antlers to a hornless plush deer — is the same failure mode in miniature. Character consistency is not solved.
Takeaway
The honest framing isn’t “wild new toy.” It’s: Google shipped a model whose riskiest mode it wouldn’t ship itself, whose watermark only catches its own outputs, and whose multimodal surface independent researchers jailbroke at 52% before launch week was over. The deferred speech-editing release is the part of the safety story that’s actually load-bearing. The rest is mostly posture.
Round-ups
Anthropic opens frontier AI policy debate to outside voices
Source: anthropic-news
Anthropic is widening its public conversation on frontier AI risks and governance, inviting broader input beyond lab insiders. The push reflects mounting pressure on developers to justify capability decisions to regulators, civil society, and researchers as model deployments scale.
DeepMind launches Asia Pacific accelerator for environmental AI
Source: deepmind-blog
Google DeepMind is opening an accelerator program in Asia Pacific aimed at startups applying AI to environmental risks. The initiative targets regional climate, biodiversity, and disaster challenges, extending DeepMind’s science-for-good push beyond its London and US research bases.
Exa, Modal, and TurboPuffer join AI infra unicorn club
Source: latent-space
Three AI infrastructure startups — search API Exa, serverless compute platform Modal, and vector database TurboPuffer — hit billion-dollar valuations in fresh rounds. The cluster of fundraises signals investor appetite shifting from model labs toward the picks-and-shovels layer powering agent and retrieval workloads.
MIT Tech Review debates whether AI can build true world models
Source: mit-tech-review-ai
A Roundtables session with editor Mat Honan and AI editors Will Douglas Heaven and James O’Donnell examines whether systems can move past LLM limits to grasp physical reality. World models have surged in the AI conversation as labs chase reasoning grounded in environment dynamics.
Pretraining blog details how to land a frontier-lab research job
Source: latent-space
A widely shared Pretraining post breaks down the hiring path into frontier AI labs, covering portfolio signals, paper reproduction, and infra fluency that recruiters weigh. Latent Space amplified the guide during a slow news lull ahead of Google I/O.
Author defends AI use after fabricated quotes appear in book
Source: ars-technica-ai
Steven Rosenbaum’s book The Future of Truth shipped with synthetic quotes invented by AI tools including ChatGPT and Claude. Rather than abandon the workflow, Rosenbaum says he will keep using assistants while tightening verification — a case study in how hallucinations slip past trade-publishing review.
Ferrari taps IBM AI to convert casual F1 viewers into superfans
Source: techcrunch-ai
Scuderia Ferrari HP and IBM are deploying AI across the team’s fan apps to personalize race content and deepen engagement with new viewers. The partnership uses telemetry and driver data to surface tailored storylines, betting that AI-curated context turns one-time watchers into loyal followers.
Footnotes
-
MindStudio — Anthropic vs OpenAI vs Google agent strategy — https://www.mindstudio.ai/blog/anthropic-vs-openai-vs-google-agent-strategy
↩ ↩2OpenAI pursues vertical integration… Anthropic has championed the Model Context Protocol (MCP) as an open standard… Google leverages platform depth, using its Workspace Intelligence layer to grant Gemini agents native access to a user’s entire history in Gmail, Drive, and Android.
-
Medium — Karpathy ‘AI psychopathy’ essay — https://medium.com/activated-thinker/andrej-karpathy-confesses-his-ai-psychopathy-why-writing-code-is-officially-a-legacy-skill-8f04267edeb1
↩He had largely stopped writing traditional code, instead spending 16-hour days ‘communicating intent’ to a fleet of parallel agents… the bottleneck is no longer human keystrokes or compute but ‘token throughput’.
-
DigitalApplied — 88% of AI agents never reach production — https://www.digitalapplied.com/blog/88-percent-ai-agents-never-reach-production-failure-framework
↩Approximately 88% of AI agent projects fail to transition from successful pilot phases into sustained production… the average failed enterprise agent project costing roughly $340,000 in direct expenses before being abandoned.
-
Exceeds.ai on the METR coding-agent study — https://blog.exceeds.ai/ai-coding-agents-productivity-paradox/
↩Despite developers perceiving a 20% speed boost from AI coding agents, they were actually 19% slower because of the exhaustive effort required to catch subtle, cascading errors.
-
Kili Technology — 2026 benchmarks guide — https://kili-technology.com/blog/ai-benchmarks-guide-the-top-evaluations-in-2026-and-why-theyre-not-enough
↩A 37% gap exists between lab benchmark scores and real-world performance… on the τ-bench, even high-reasoning models manage only a 13-25% success rate.
-
AppSec Santa — AI pentesting agents 2026 — https://appsecsanta.com/research/ai-pentesting-agents-2026
↩Anthropic shared results from Project Glasswing, demonstrating that while AI agents can surface significantly more security vulnerabilities than human teams, the resulting ‘discovery flood’ creates a new challenge: teams cannot triage and patch the findings as quickly as the agents identify them.
-
Earthjustice press release — https://earthjustice.org/press/2026/xai-sued-for-illegal-power-plant
↩xAI operated at least 27 unpermitted turbines—later growing to 46—effectively building a major power plant without the preconstruction or operating air permits mandated by federal law
-
PBS NewsHour — https://www.pbs.org/newshour/nation/naacp-files-intent-to-sue-elon-musks-xai-company-over-memphis-supercomputer-air-pollution
↩NAACP files intent to sue Elon Musk’s xAI company over Memphis supercomputer air pollution… residents already face cancer risks four times the national average
-
SpaceWar / orbital DC analysis — https://www.spacewar.com/reports/Orbital_Data_Centers_The_175_Trillion_Bridge_To_Nowhere_999.html
↩1 GW orbital deployment at $31.2 billion ($31.20/W) versus $14.8 billion on the ground, LCOE $891/MWh vs $398/MWh terrestrial
-
↩Musk vows to put data centers in space and run them on solar power, but experts have their doubts — space is a vacuum; heat must be removed via inefficient radiative cooling rather than the convective cooling used on Earth
-
↩Tesla executed a stunning reversal in early 2026… Musk announced an ambitious new target of 100 GW of annual solar production, a nearly 300x increase, backed by a $2.9 billion equipment deal with Suzhou Maxwell
-
EnkiAI industry analysis — https://enkiai.com/data-center/gas-to-power-boom-ai-drives-2026-on-site-energy-shift/
↩Even traditional hyperscalers are pivoting toward natural gas as a bridge; Meta recently signed a 20-year natural gas deal… share of planned natural gas projects rose from 11% in 2024 to over 18% in 2026
-
↩ ↩2Google has explicitly deferred the release of broader speech-and-audio editing features that would allow users to change what someone says in an existing video… while Google continues safety testing.
-
Enkrypt AI red-team study — https://medium.com/enkrypt-ai/uncovering-safety-gaps-in-gemini-a-multimodal-red-teaming-study-e4dd3986c478
↩ ↩2Vision-based attacks were the most effective vector, with a 52% success rate in bypassing safeguards for prohibited CBRN content when text and images were combined.
-
BuildFastWithAI review — https://www.buildfastwithai.com/blogs/gemini-omni-google-ai-video-model-review
↩ ↩2SynthID only detects Google-generated content; it cannot identify AI media from Midjourney or OpenAI, leading some to call it a ‘small step’ rather than a ‘silver bullet.’
-
JXP Gemini Omni review — https://www.jxp.com/gemini-omni/blog/gemini-omni-review
↩ ↩2Testers report a ‘four-turn ceiling’ where the model’s memory of the original scene begins to deteriorate… by the fifth or sixth edit, background elements often warp or vanish.
-
Fairplay for Kids (child-advocacy coalition) — https://fairplayforkids.org/advocates-child-development-experts-urge-google-to-halt-gemini-ai-rollout-to-young-children-request-ftc-investigation-2/
↩Advocates and child-development experts urge Google to halt Gemini AI rollout to young children, request FTC investigation
-
JXP ‘What is Gemini Omni’ explainer — https://www.jxp.com/gemini-omni/blog/what-is-gemini-omni
↩The platform’s backend often ‘crumbles under its own weight’… infinite loading loops where the UI hangs on a ‘Thinking…’ screen indefinitely.