Brockman runs OpenAI product, DeepSeek trails 8 months, Malta gifts ChatGPT Plus
OpenAI consolidates product under Brockman, CAISI ranks DeepSeek 8 months behind, and Malta hands every citizen ChatGPT Plus.
Brockman runs OpenAI product, DeepSeek trails 8 months, Malta gifts ChatGPT Plus
TL;DR
- Greg Brockman takes ChatGPT, Codex, and the API as one product org at OpenAI.
- OpenAI projects $14B in 2026 losses on $25B revenue heading into its IPO.
- CAISI ranks DeepSeek V4 Pro ~8 months behind US frontier models on contested IRT scoring.
- Malta offers ChatGPT Plus to every citizen, gated behind a national AI literacy course.
- Cerebras IPOs at a $60B valuation, the first major AI-hardware listing since Nvidia’s run.
Three unrelated US AI moves anchor the day. OpenAI collapses ChatGPT, Codex, and the API under Greg Brockman as a single product org — a builder-executive consolidation aimed at a pre-IPO story that has to outrun $14B in projected 2026 losses and a Claude Code that now leads SWE-bench Verified at 87.6%. New bylaws also raise the bar to remove Sam Altman to a two-thirds supermajority of non-employee directors.
Outside the company, the US government and a US vendor each plant a different kind of flag. CAISI publishes a benchmark putting DeepSeek V4 Pro ~8 months behind US frontier models — on an IRT scoring method critics are already picking apart — while Malta becomes the first country to hand every citizen ChatGPT Plus, with OpenAI’s George Osborne pitching it as an embrace-us-or-fall-behind-China choice. Locally fine-tuned BERTu still beats the frontier on Maltese.
OpenAI puts Brockman over product as ChatGPT and Codex merge
Source: techcrunch-ai · published 2026-05-16
TL;DR
- Greg Brockman now owns ChatGPT, Codex and the API as one product org under a single builder-executive.
- OpenAI is staring at $14B in projected 2026 losses on $25B revenue heading into an IPO.
- New bylaws raised the bar to remove Sam Altman to a 2/3 supermajority of non-employee directors.
- Claude Code leads SWE-bench Verified at 87.6%, ahead of Codex on multi-file bug fixing.
A pre-IPO consolidation, not a product epiphany
The cleanest read on Brockman’s elevation is financial. OpenAI is staring at roughly $14B in losses on $25B in 2026 revenue, and Anthropic’s private valuation has reportedly leapfrogged it to around $900B 1. Collapsing ChatGPT, Codex and the API under one builder-executive gives bankers a single “agentic platform” narrative to pitch, and it pairs with a quieter governance move: new bylaws now require a 2/3 supermajority of non-employee directors to remove Sam Altman, up from a simple majority 2. Narrative and chain of command, both locked down before the S-1.
Brockman’s management record is not a footnote
TechCrunch frames Brockman as a natural builder-CEO. Independent reporting is sharper. The Information surfaced documents from former chief scientist Ilya Sutskever alleging a “bullying” pattern, and reports that former CTO Mira Murati complained Brockman routinely bypassed her authority by going directly to Altman to overturn operational decisions 3. That history matters now that he formally owns product end-to-end — the end-runs around middle management that helped push out two of OpenAI’s most senior leaders are now structurally sanctioned rather than worked around.
Developers are flagging the architecture
The “super app” thesis is getting pushback from the people who use it. A widely-circulated OpenAI community post argues the unified app treats ChatGPT as a peer tab to Codex rather than a non-blocking master process, imposing a “cognitive tax” every time an asynchronous engineering task stalls the chat thread 4. Security researchers separately disclosed a Codex command-injection bug that exfiltrated GitHub tokens via maliciously crafted branch names, plus a DNS-based outbound channel in the ChatGPT code-execution runtime that could silently move data out 5. Fusing a consumer chatbot with a privileged code-execution sandbox expands the blast radius; it doesn’t shrink it.
flowchart LR
U[User chat thread] --> C{Unified ChatGPT + Codex}
G[GitHub repo / branch names] --> C
C --> S[Code execution sandbox]
S -. DNS exfil channel .-> X((External world))
S -. stolen GitHub tokens .-> X
The Claude Code gap the announcement skips
The merger is being sold partly on Codex’s strength as a coding agent. Independent head-to-heads complicate that:
| Benchmark | Codex | Claude Code (Opus 4.7) |
|---|---|---|
| SWE-bench Verified (multi-file bug fixing) | — | 87.6% |
| Terminal-Bench 2.0 | 77.3% | 65.4% |
| Tokens used on identical UI-cloning tasks | 1× | ~4× |
Codex wins on shell-level task completion and is dramatically cheaper per task, but Claude Code is ahead where it arguably matters most for paying engineering customers: actually fixing bugs across a real repo 6. That’s a real product gap, not a marketing one.
What’s actually at stake
Strip the org-chart language and this is a financially-driven consolidation, led by an operator with a documented pattern of bypassing peers, shipping a stack that developers are already calling architecturally wrong and security researchers are already finding holes in — into a market where the main rival is winning on code quality. The IPO story may hold. The product story has to survive contact with the people building on it.
DeepSeek V4 lags US frontier 8 months, CAISI finds
Source: interconnects · published 2026-05-16
TL;DR
- CAISI ranks DeepSeek V4 Pro ~8 months behind US frontier models.
- Critics flag CAISI’s IRT scoring and a cost panel narrowed to one US comparator.
- DeepSeek V4’s Hybrid Attention runs 1M-token inference at 27% FLOPs, 10% KV cache.
- GLM-5.1 hits 58.4 on SWE-Bench Pro on 100K Huawei Ascend chips at ~60% H100 throughput.
The CAISI verdict, and its asterisks
The headline from NIST’s CAISI assessment is clean: DeepSeek V4 Pro trails the most advanced US models by roughly eight months, but undercuts GPT-5.4 mini by 41–53% across five of seven benchmark categories 7. That’s the line every roundup is quoting, including Lambert’s.
The methodology is less clean. CAISI scores models with Item Response Theory rather than leaderboard averaging — a choice that weights specific hard-item failures heavily and, per independent reviewers, may inflate gaps that public trackers like Stanford’s AI Index and Artificial Analysis show as stable or narrowing 8. The cost-efficiency table is narrower still: every US model except GPT-5.4 mini was filtered out before the comparison ran 8. CAISI itself concedes a “scaffolding gap” — its SWE-Bench Verified runs skip DeepSeek’s recommended prompts and token budgets, producing lower scores than third parties report 8.
None of this makes the eight-month figure wrong. It makes it a specific claim from a specific methodology, not a consensus.
Three bets, not one bonanza
The “open bonanza” framing flattens what’s actually happening. The May drop is three distinct strategies:
| Lab | Bet | Concrete signal |
|---|---|---|
| DeepSeek V4 | Inference efficiency | 27% FLOPs, 10% KV cache at 1M ctx 9 |
| Zhipu GLM-5.1 | Hardware sovereignty | 100K Ascend 910B, 58.4 SWE-Bench Pro 10 |
| Xiaomi MiMo V2.5 Pro | Agentic endurance | Rust SysY compiler, 233/233 tests, 672 tool calls, 40–60% fewer tokens than Claude Opus 4.6 11 |
GLM-5.1 is the one most likely to be under-read in the West. Training a frontier-tier model on Huawei silicon running at ~60% of H100 throughput — and shipping a top-tier SWE-Bench Pro number — is a sanctions-regime story, not just a benchmark story 10. MiMo’s token-efficiency claim is the one most likely to matter in production: a 4.3-hour autonomous Rust compiler build at half the token count is a cost line on someone’s invoice 11.
Gemma 4 is not getting a free pass
Lambert files Gemma 4 alongside the Chinese flagships as another headline release. The Apache 2.0 relicensing earns that placement; the model itself is getting a colder reception. Reviewers report the 26B MoE variant under-uses tool calls without “pushy” prompting and over-refuses on basic medical and electrical-safety questions 12. MoE routing also tanks consumer-GPU throughput — ~11 tok/s on an RTX 4090 12. None of that surfaces in Arena scores, which is exactly the kind of friction the leaderboard-first framing of these roundups tends to bury.
What’s actually at stake
Lambert isn’t a neutral narrator here. His ATOM project explicitly argues the US needs $100M open labs to close the adoption gap, which is consistent with relaying CAISI’s framing without much friction. The more honest read of May’s releases: the gap is real on agentic robustness and scaffolding sensitivity, narrowing on raw reasoning, and arguably overstated by CAISI’s benchmark and comparator choices. “China is eight months behind” and “China is shipping three differentiated frontier architectures on sanctioned hardware” are both true. The second one is the more load-bearing fact.
Malta gives every citizen a free year of ChatGPT Plus
Source: openai-blog · published 2026-05-16
TL;DR
- Malta is the first country offering ChatGPT Plus to every citizen, gated behind a national AI literacy course.
- OpenAI’s pitch from policy chief George Osborne is explicitly geopolitical: embrace us or fall behind China.
- ~68% vs. 93–97% — frontier models trail locally fine-tuned BERTu on Maltese, fueling a “curiosity not capability” critique at home.
- Greece’s parallel rollout drew a 22-org civil-society letter over CLOUD Act exposure and opaque procurement.
What’s actually being shipped
Under the “OpenAI for Countries” banner, Malta’s Digital Innovation Authority will distribute a year of ChatGPT Plus to any citizen — including those abroad — who completes an AI literacy course built by the University of Malta. The phased rollout started in May 2026. Minister Silvio Schembri framed it as keeping Maltese workers at the “forefront of global change”; OpenAI’s George Osborne called it a model for other governments to copy.
The course-completion gate is the genuinely novel piece. The UAE deal that everyone keeps citing as precedent turns out to be more modest than reported — Gulf News clarified the “free ChatGPT for every resident” framing was overstated, and the actual UAE arrangement is platform integration, not individual subscriptions 13. Malta is the first per-citizen consumer rollout, not a follower.
How the “OpenAI for Countries” deals compare
| Country | Shape of deal | Organized opposition |
|---|---|---|
| Malta | Free ChatGPT Plus per citizen, course-gated | Industry critique, no civil-society letter yet |
| Greece | Schools rollout, infrastructure | 22-org open letter led by Homo Digitalis 14 |
| UAE | Platform integration, not per-resident subs | Mostly corrected in local press 13 |
The pushback OpenAI’s press release doesn’t mention
The sharpest dissent is domestic and economic, not privacy-driven. EBO CEO Dr. Gege Gatt argues the state should “buy capability, not dependency,” and that subsidising a proprietary American platform sidelines Malta’s own AI ecosystem — including the SaqWI dataset and a €7M Maltese-English speech-to-text engine already in development 15. The Nationalist Party has tabled a €12M counter-proposal for a domestic AI/data/space sector, comparing the OpenAI deal to Schembri’s earlier “Blockchain Island” push that produced little durable industry 15.
There’s a concrete technical problem under the political one: generic frontier models score around 68% on Maltese-language tasks, versus 93–97% for locally fine-tuned models like BERTu. The state is bulk-buying tools that measurably underperform on the country’s own Semitic language.
Osborne’s own framing makes the sovereignty critique harder to dismiss. In a Guardian interview he pitched OpenAI for Countries as a hedge against Chinese-led models — countries that don’t embrace AI “will be left behind” 16. That’s a sales posture, not a neutral-utility argument, and it leaves open the question of what Malta becomes dependent on.
What to watch
The Malta Chamber is asking the right operational question: success should be measured by job mobility and productivity gains, not certificates issued, and the Chamber wants a public implementation dashboard with six-month reviews 17. Per-user pricing paid to OpenAI and Microsoft hasn’t been disclosed against a €100M digitalisation envelope. And no source addresses the cliff: when the free year ends, who pays renewal for a population now habituated to Plus?
Hacker News commenters read the deal cynically as MAU-padding ahead of an intensifying fight with Anthropic and Google 18. That’s uncharitable but not wrong about the incentive structure. The “national utility” framing only holds if Malta retains the option to switch providers — and right now nothing in the announcement guarantees that.
Round-ups
Cerebras prices IPO at $60B valuation
Source: latent-space
Wafer-scale chipmaker Cerebras went public at a $60 billion valuation, capping a years-long path through export-control delays and a contested Group 42 partnership. The listing marks the first major AI-hardware IPO since Nvidia’s run reshaped investor appetite for accelerator startups.
Musk v. Altman trial heads to jury after credibility brawl
Source: mit-tech-review-ai
Closing arguments in the Musk v. Altman trial pitted dueling character attacks against each other. Altman faced questions over alleged self-dealing with OpenAI vendors, while his lawyers cast Musk as a power-seeker who wanted unilateral control of OpenAI’s early development. Jurors begin deliberating this week.
OpenAI says Apple botched ChatGPT integration in Siri
Source: ars-technica-ai
OpenAI insiders describe feeling burned by Apple Intelligence’s shallow ChatGPT hooks, which they blame for weak usage numbers. A judge has now ordered Apple to hand Elon Musk internal messages about the secretive deal as part of xAI’s antitrust suit.
arXiv to ban authors for a year over AI-written papers
Source: techcrunch-ai
The preprint server arXiv will suspend submitters for 12 months if they pass off LLM-generated text as their own work. Moderators have been flooded with low-quality machine drafts, particularly in cs.AI, and the policy targets careless rather than disclosed AI assistance.
CFTC turns to AI to police insider trading on Kalshi and Polymarket
Source: ars-technica-ai
The Commodity Futures Trading Commission is deploying machine-learning surveillance to detect suspicious wagering on prediction-market venues like Kalshi and Polymarket. The move follows growing concern that election and policy contracts are vulnerable to traders acting on nonpublic information.
Chinese vertical-video studios industrialize AI for short dramas
Source: mit-tech-review-ai
China’s booming short-drama industry is churning out fantasy and romance episodes using generative video tools, slashing production costs on effects-heavy scenes like flame vines and dragon tattoos. Studios now treat AI as core pipeline infrastructure rather than experimental garnish.
Deepfake porn victims hit takedown and copyright dead ends
Source: mit-tech-review-ai
Women discovering AI-generated nonconsensual porn of themselves face a patchwork of takedown systems that rarely work, with copyright law offering little leverage when the synthetic footage isn’t technically theirs. One subject found videos via a routine facial-recognition check on her own headshot.
Footnotes
-
TradingKey analysis — OpenAI IPO / Brockman — https://www.tradingkey.com/analysis/stocks/us-stocks/261902715-openai-ipo-chatgpt-codex-api-ai-agent-brockman-tradingkey
↩OpenAI faces projected losses of approximately $14 billion against $25 billion in revenue for 2026, while Anthropic’s valuation surged to $900 billion in early 2026, surpassing OpenAI’s private market standing.
-
letsdatascience.com — bylaws change — https://letsdatascience.com/news/openai-raises-board-threshold-to-remove-ceo-fa4f3f15
↩New bylaws require a 2/3 supermajority of non-employee directors to remove CEO Sam Altman, a significant increase from the simple majority required under the previous nonprofit board.
-
The Information — ‘Greg Brockman’s Rough Day’ — https://www.theinformation.com/newsletters/ai-agenda/greg-brockmans-rough-day
↩Documents supplied by former Chief Scientist Ilya Sutskever alleged a pattern of ‘bullying’ by Brockman, and former CTO Mira Murati reportedly complained that Brockman would bypass her authority by going directly to Altman to overturn operational decisions.
-
OpenAI Community forum — ‘Superapp architecture’ post — https://community.openai.com/t/superapp-architecture-chatgpt-should-be-the-non-blocking-master-process-not-a-peer-tab/1381086
↩When the unified platform runs an engineering task, the user’s conversational flow is frequently interrupted, causing a ‘cognitive tax’ as they wait for asynchronous executions; ChatGPT should be the non-blocking master process, not a peer tab.
-
SQ Magazine — ‘OpenAI ChatGPT Codex security flaws fixed’ — https://sqmagazine.co.uk/openai-chatgpt-codex-security-flaws-fixed/
↩Researchers uncovered a command injection bug in Codex that allowed attackers to steal GitHub authentication tokens via maliciously crafted branch names, alongside a hidden DNS-based outbound channel in the ChatGPT code execution runtime that could silently exfiltrate data.
-
MindStudio — ‘Codex vs Claude Code 2026’ — https://www.mindstudio.ai/blog/codex-vs-claude-code-2026
↩Claude Code (Opus 4.7) tops SWE-bench Verified at 87.6% for multi-file bug fixing, while Codex leads Terminal-Bench 2.0 at 77.3% vs Claude’s 65.4%; Codex reportedly uses 4x fewer tokens than Claude for identical UI-cloning tasks.
-
NIST CAISI evaluation page — https://www.nist.gov/news-events/news/2026/05/caisi-evaluation-deepseek-v4-pro
↩DeepSeek V4 Pro’s capabilities lag approximately eight months behind the most advanced U.S. models… more cost-effective in five out of seven benchmarked categories, with savings ranging from 41% to 53%
-
Medium (Moksh) – CAISI methodology critique — https://medium.com/@moksh.9/heres-a-tighter-benchmark-focused-blog-post-501c5ea829f4
↩ ↩2 ↩3CAISI utilizes Item Response Theory (IRT) to estimate model capability rather than simple averaging, a statistical choice that some independent developers claim may overstate performance gaps… cost-efficiency analysis has been labeled ‘narrow’ because it filtered out all U.S. models except for GPT-5.4 mini
-
WaveSpeed technical brief on DeepSeek V4 — https://wavespeed.ai/blog/posts/deepseek-v4-everything-we-know-about-the-upcoming-coding-ai-model/
↩For a one-million-token context window, the model reportedly uses only 27% of the per-token inference FLOPs and 10% of the KV cache footprint required by standard architectures
-
ModemGuides – GLM-5.1 training infrastructure — https://www.modemguides.com/blogs/ai-news/glm-5-1-open-source-benchmarks-local-ai
↩ ↩2trained on a cluster of approximately 100,000 Huawei Ascend 910B chips using the MindSpore framework… Ascend silicon typically operates at roughly 60% of the performance of NVIDIA’s flagship hardware
-
VentureBeat – MiMo V2.5 agentic claims — https://venturebeat.com/technology/open-source-xiaomi-mimo-v2-5-and-v2-5-pro-are-among-the-most-efficient-and-affordable-at-agentic-claw-tasks
↩ ↩2implemented a complete SysY Compiler in Rust from scratch, passing 233/233 tests within 4.3 hours and 672 tool calls… 40–60% fewer tokens than competitors like Claude Opus 4.6 and GPT-5.4
-
Dev.to community review of Gemma 4 — https://dev.to/aniruddhaadak/gemma-4-complete-guide-2026-architecture-benchmarks-deployment-3en9
↩ ↩2users report that the 26B MoE variant is significantly less proactive than rivals like Qwen 3.5, often requiring ‘pushy’ prompting… reports of the model refusing to answer basic medical or electrical safety questions
-
↩ ↩2Official clarifications indicate the focus is on ‘nationwide access’ through integrated public platforms rather than individual personal subscriptions for every resident — clarifying that the earlier ‘free ChatGPT Plus for all UAE residents’ framing was overstated.
-
Information Security Buzz (on OpenAI for Greece) — https://informationsecuritybuzz.com/when-innovation-meets-education-caution-before-celebrating-openai-for-greece/
↩A coalition of 22 civil society organizations, led by Homo Digitalis, issued an open letter protesting the lack of transparency in the deal and the potential exposure of student data to the US CLOUD Act.
-
Malta CEOs (Dr. Gege Gatt, EBO) — https://maltaceos.mt/must-shape-ai-not-just-use-it/
↩ ↩2The state should ‘buy capability, not dependency’… subsidizing access to proprietary American platforms, Malta may be sidelining the development of a local AI ecosystem.
-
The Guardian (George Osborne interview) — https://www.theguardian.com/politics/2026/feb/18/countries-do-not-embrace-ai-left-behind-george-osborne
↩Countries that do not embrace AI will be left behind — framing OpenAI for Countries as a hedge against falling behind Chinese-led models.
-
MaltaToday — https://www.maltatoday.com.mt/news/national/141832/ai_for_everyone_course_launched_with_free_access_and_certification
↩The Malta Chamber argues success should be measured by tangible economic outcomes — job mobility and productivity gains — rather than just the number of certificates issued, and calls for a public implementation dashboard with six-month reviews.
-
Hacker News thread (item 48163754) — https://news.ycombinator.com/item?id=48163754
↩Commenters characterized the initiative as a ‘limited regional beta’ and a tactic to ‘hook’ an entire population into the ecosystem and boost MAU as competition from Anthropic and Google intensifies.