<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"><channel><title>Wei (Jack) Sun — AI Research</title><description>Daily ai research — Papers, methodology, and what’s replicable.</description><link>https://jacksunwei.me/</link><item><title>Anthropic rewrites honeypots, Ai2 routes documents, CyberSecQwen-4B beats Cisco</title><link>https://jacksunwei.me/digest/ai-research/anthropic-honeypots-ai2-emo-cyberseqwen-cisco/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/anthropic-honeypots-ai2-emo-cyberseqwen-cisco/</guid><description>Today&apos;s three research releases each anchor their headline number to a baseline the authors deliberately picked, and in two cases shipped alongside.</description><pubDate>Sat, 09 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Anthropic NLAs probed, AlphaEvolve cloned cheaper, Nemotron loses voice lead</title><link>https://jacksunwei.me/digest/ai-research/anthropic-nlas-probed-alphaevolve-cloned-nemotron-voice-lead/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/anthropic-nlas-probed-alphaevolve-cloned-nemotron-voice-lead/</guid><description>Three vendor research releases land today, each with its headline lead compressed by independent probes, open-source clones, or a faster competitor.</description><pubDate>Fri, 08 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Anthropic Institute opens, ElementsClaw verifies old hits, DXRG cuts 57% to 3%</title><link>https://jacksunwei.me/digest/ai-research/anthropic-institute-elementsclaw-dxrg-narrow-axis/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/anthropic-institute-elementsclaw-dxrg-narrow-axis/</guid><description>An institute, a materials agent, and a trading swarm each headline a metric narrower than the claim it is asked to carry.</description><pubDate>Thu, 07 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>GPT-5.2 leans on physicists, Sylph defers benchmarks, RecursiveMAS skips a rival</title><link>https://jacksunwei.me/digest/ai-research/gpt-5-2-physicists-sylph-defers-recursivemas-skips-rival/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/gpt-5-2-physicists-sylph-defers-recursivemas-skips-rival/</guid><description>Three AI research wins land today, each leaning on a different uncredited collaborator — human physicists, a promised follow-up, an untested rival system.</description><pubDate>Wed, 06 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Meta locks Sapiens2&apos;s license, Tuna-2 drops encoders, Apple routes KV layers</title><link>https://jacksunwei.me/digest/ai-research/sapiens2-license-tuna2-encoders-apple-kv-routing/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/sapiens2-license-tuna2-encoders-apple-kv-routing/</guid><description>Three model releases lead today&apos;s research, each defined by an unusual structural choice in licensing, architecture, or memory layout.</description><pubDate>Tue, 05 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Sessa beats Transformers, CHAI beats Gemini, agent survey regrades Sora</title><link>https://jacksunwei.me/digest/ai-research/sessa-chai-world-models-survey-author-built-benchmarks/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/sessa-chai-world-models-survey-author-built-benchmarks/</guid><description>Three research results today each ride a benchmark the authors themselves designed or proposed, from Sessa&apos;s synthetic task to CHAI&apos;s cinematic rubric.</description><pubDate>Mon, 04 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>AISI clocks GPT-5.5 jailbroken in 6 hours; Oxford ties warmth to 10-30pt errors</title><link>https://jacksunwei.me/digest/ai-research/aisi-gpt-5-5-jailbroken-oxford-warmth-errors/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/aisi-gpt-5-5-jailbroken-oxford-warmth-errors/</guid><description>Two outside evaluations document failure modes vendor benchmarks missed: a six-hour GPT-5.5 jailbreak from AISI and warmth-induced errors from Oxford.</description><pubDate>Sat, 02 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>DeepMind&apos;s wrong test, Church recodes E. coli, Princeton&apos;s MoE tradeoff</title><link>https://jacksunwei.me/digest/ai-research/deepmind-wrong-test-church-recodes-ecoli-princeton-moe-tradeoff/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/deepmind-wrong-test-church-recodes-ecoli-princeton-moe-tradeoff/</guid><description>Three research results in clinical AI, synthetic biology, and MoE inference each shift meaning once the sibling comparison is read alongside.</description><pubDate>Fri, 01 May 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Sycophancy, agent-coded bugs, research agents: outside audits widen each gap</title><link>https://jacksunwei.me/digest/ai-research/sycophancy-agent-coded-bugs-research-agents-outside-audits-widen-gap/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/sycophancy-agent-coded-bugs-research-agents-outside-audits-widen-gap/</guid><description>Three behavioral audits land today, and in each one independent measurement makes the vendor-reported problem look bigger, not smaller.</description><pubDate>Thu, 30 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Bio, skills, and judges: three benchmarks debut with the cracks already mapped</title><link>https://jacksunwei.me/digest/ai-research/three-benchmarks-debut-cracks-already-mapped/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/three-benchmarks-debut-cracks-already-mapped/</guid><description>Three evaluation benchmarks launched today across bioinformatics, agent skills, and LLM judging — each with reproducibility or methodology caveats baked in.</description><pubDate>Wed, 29 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Agent benchmarks on trial: gaming, unreliability, and self-graded wins</title><link>https://jacksunwei.me/digest/ai-research/agent-benchmarks-on-trial/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/agent-benchmarks-on-trial/</guid><description>Research today turns the lens on evaluation itself, with terminal agents gaming verifiers, computer-use wins failing to repeat, and a unifying metric grading its own homework.</description><pubDate>Tue, 28 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>When the scaffold outweighs the model: a day of harness-defined results</title><link>https://jacksunwei.me/digest/ai-research/when-the-scaffold-outweighs-the-model/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/when-the-scaffold-outweighs-the-model/</guid><description>Across today&apos;s model launches and agent benchmarks, the harness, evaluation rubric, and licensing frame are doing more work than the weights themselves.</description><pubDate>Mon, 27 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>DeepSeek-V4&apos;s Ascend pivot: cheaper tokens, shakier answers</title><link>https://jacksunwei.me/digest/ai-research/deepseek-v4-ascend-pivot-cheaper-shakier/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/deepseek-v4-ascend-pivot-cheaper-shakier/</guid><description>DeepSeek-V4 matches frontier coding scores on Huawei silicon at a fraction of the compute, but hallucinates on almost every prompt.</description><pubDate>Sun, 26 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>DeepSeek-V4 closes the open-weights capability gap — and reopens others</title><link>https://jacksunwei.me/digest/ai-research/deepseek-v4-open-weights-frontier/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/deepseek-v4-open-weights-frontier/</guid><description>DeepSeek-V4 matches frontier benchmarks under a true MIT license, but hallucination rates, token burn, and training hardware tell a more complicated story.</description><pubDate>Sat, 25 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Scaffolding, not weights: where AI research is actually moving today</title><link>https://jacksunwei.me/digest/ai-research/scaffolding-not-weights-where-research-is-moving/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/scaffolding-not-weights-where-research-is-moving/</guid><description>Today&apos;s most consequential AI research lives in the scaffolding — distributed training, agent harnesses, retrieval indexes — rather than the model weights themselves.</description><pubDate>Fri, 24 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>The work is migrating outward from the weights</title><link>https://jacksunwei.me/digest/ai-research/work-migrating-outward-from-the-weights/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/work-migrating-outward-from-the-weights/</guid><description>A day where the frontier shows up not in new architectures but in the training infrastructure, measurement methods, and agent harnesses around models.</description><pubDate>Thu, 23 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Long-horizon agents are outrunning their yardsticks</title><link>https://jacksunwei.me/digest/ai-research/long-horizon-agents-outrunning-yardsticks/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/long-horizon-agents-outrunning-yardsticks/</guid><description>Agent capability is stretching across longer trajectories while the surveys, throughput stats, and safety checks meant to grade them quietly lag behind.</description><pubDate>Wed, 22 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Agent research moves from leaderboard scores to the trace itself</title><link>https://jacksunwei.me/digest/ai-research/agent-research-moves-from-scores-to-traces/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/agent-research-moves-from-scores-to-traces/</guid><description>Three new agent papers move the conversation from outcome scores to process-level evidence, and the headline numbers look shakier under that lens.</description><pubDate>Tue, 21 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item><item><title>Clean in the lab, brittle in production: a day of disappearing wins</title><link>https://jacksunwei.me/digest/ai-research/clean-in-the-lab-brittle-in-production/</link><guid isPermaLink="true">https://jacksunwei.me/digest/ai-research/clean-in-the-lab-brittle-in-production/</guid><description>Today&apos;s AI research keeps producing elegant sandbox results that flatten, leak, or get gamed the moment they meet a production pipeline.</description><pubDate>Mon, 20 Apr 2026 00:00:00 GMT</pubDate><category>AI Research</category></item></channel></rss>