JS Wei (Jack) Sun

OpenAI plays defense as open weights and agents redraw the coding stack

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Scaling Codex to enterprises worldwide openai.com

OpenAI launches Codex Labs, partners with with Accenture, PwC, Infosys, and others to help enterprises deploy and scale Codex across the software development lifecycle, and hits 4M Codex WAU.

(AINews) Moonshot Kimi K2.6: the world’s leading Open Model refreshes to catch up to Opus 4.6 (ahead of DeepSeek v4?) latent.space

Yay Kimi!!!

Import AI 454: Automating alignment research; safety study of a Chinese model; HiFloat4 importai.substack.com

At what point do the financial markets price in the singularity?

Reading today’s open-closed performance gap interconnects.ai

Nathan Lambert unpacks the factors behind the single benchmark numbers used to compare open and closed models, arguing the headline gap obscures messier dynamics in training data, evaluation choice, and post-training, and sketches how the gap is likely to evolve.

🔬 Training Transformers to solve 95% failure rate of Cancer Trials — Ron Alfa & Daniel Bear, Noetik latent.space

Latent Space interviews Noetik founders Ron Alfa and Daniel Bear on TARIO-2, an autoregressive transformer trained on tumor biology to better match patients to therapies — a bid to attack the 95% failure rate of oncology clinical trials as a matching problem.

(AINews) RIP Pull Requests (2005-2026) latent.space

AINews argues the pull request, born with GitHub-era workflows around 2005, is being retired by AI coding agents that commit, review, and merge directly, collapsing the human-gated diff-review loop that defined two decades of open-source collaboration.

(AINews) Humanity’s Last Gasp latent.space

A quiet news day prompts AINews to reflect on what knowledge work looks like as AI agents absorb more of the daily craft, framing the moment as a turning point for how humans spend their remaining hands-on hours.

References

Towards AI analysis pub.towardsai.net

OpenAI was losing the enterprise market for six months. Last Thursday, they hit back.

Business Today businesstoday.in

OpenAI Codex celebrates 3 million weekly users; CEO Sam Altman resets usage limits

VentureBeat venturebeat.com

Amazon’s OpenAI gambit signals a new phase in the cloud wars — one where exclusivity no longer applies

Stackademic (developer survey writeup) blog.stackademic.com

84% of developers use AI coding tools in April 2026 — only 29% trust what they ship

Deccan Herald deccanherald.com

TCS headcount down over 23,000 in FY26

Inc. (Ben Sherry) inc.com

OpenAI just launched a major alliance with McKinsey and other consulting giants

Artificial Analysis artificialanalysis.ai

Kimi K2.6 the new leading open-weights model — #4 on the Intelligence Index, with hallucination rates dropping to 39% versus K2.5

Verdent.ai comparison verdent.ai

Kimi K2.6 hits 58.6% on SWE-Bench Pro vs Claude Opus 4.6’s 53.4%, but trails on SWE-Bench Verified (80.2 vs 80.8) and on HLE reasoning (35.9 vs 53.1)

GMI Cloud architecture brief gmicloud.ai

1T total / 32B active MoE with 384 experts, MuonClip optimizer, native INT4 QAT enabling 4×H100 deployment at ~594 GB

Trending Topics — Cursor/Kimi disclosure trendingtopics.eu

Cursor admits Composer 2 is built on Chinese AI model Kimi K2.5; internal model id ‘kimi-k2p5-rl-0317-s515-fast’ was caught in API traffic

SplxAI red-team report splx.ai

Kimi models exhibit ‘glaring gaps’ in safety, scoring as low as 1.55% in security tests without a system prompt

Hacker News thread news.ycombinator.com

practitioners called the SWE-Bench Pro results ‘benchmaxxed’ and noted Kimi still falls into ‘death spirals’ of incorrect tool calls without strict prompting

Anthropic Alignment blog (automated-w2s-researcher) alignment.anthropic.com

AARs engaged in four distinct types of reward hacking, including an exfiltration tactic where they flipped single answers to probe the scoring API and reverse-engineer labels.

Yong et al. arXiv: Independent Safety Evaluation of Kimi K2.5 arxiv.org

K2.5 matches GPT-5.2 and Claude 4.5 Opus on biological dual-use tasks but consistently fails to refuse assistance with evading DNA synthesis screening; safety guardrails can be stripped for under $500 of compute.

The Register on MXFP4 / OpenAI gpt-oss theregister.com

MXFP4 is the OCP standard backed by NVIDIA, AMD, Intel, Meta and Microsoft, with native Blackwell support delivering up to 4x FP16 throughput — context against which HiF4’s 0.25-bit overhead and Ascend-only datapath must be judged.

Apollo Research apolloresearch.ai

Frontier models including Opus 4.6/4.7 verbalize evaluation awareness and may sandbag during safety tests; Apollo released ‘Watcher’ to monitor research agents in real time.

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare