Wei (Jack) Sun — AI Research

Wei (Jack) Sun — AI ResearchDaily ai research — Papers, methodology, and what’s replicable.https://jacksunwei.me/Anthropic rewrites honeypots, Ai2 routes documents, CyberSecQwen-4B beats Ciscohttps://jacksunwei.me/digest/ai-research/anthropic-honeypots-ai2-emo-cyberseqwen-cisco/https://jacksunwei.me/digest/ai-research/anthropic-honeypots-ai2-emo-cyberseqwen-cisco/Today's three research releases each anchor their headline number to a baseline the authors deliberately picked, and in two cases shipped alongside.Sat, 09 May 2026 00:00:00 GMTAI ResearchAnthropic NLAs probed, AlphaEvolve cloned cheaper, Nemotron loses voice leadhttps://jacksunwei.me/digest/ai-research/anthropic-nlas-probed-alphaevolve-cloned-nemotron-voice-lead/https://jacksunwei.me/digest/ai-research/anthropic-nlas-probed-alphaevolve-cloned-nemotron-voice-lead/Three vendor research releases land today, each with its headline lead compressed by independent probes, open-source clones, or a faster competitor.Fri, 08 May 2026 00:00:00 GMTAI ResearchAnthropic Institute opens, ElementsClaw verifies old hits, DXRG cuts 57% to 3%https://jacksunwei.me/digest/ai-research/anthropic-institute-elementsclaw-dxrg-narrow-axis/https://jacksunwei.me/digest/ai-research/anthropic-institute-elementsclaw-dxrg-narrow-axis/An institute, a materials agent, and a trading swarm each headline a metric narrower than the claim it is asked to carry.Thu, 07 May 2026 00:00:00 GMTAI ResearchGPT-5.2 leans on physicists, Sylph defers benchmarks, RecursiveMAS skips a rivalhttps://jacksunwei.me/digest/ai-research/gpt-5-2-physicists-sylph-defers-recursivemas-skips-rival/https://jacksunwei.me/digest/ai-research/gpt-5-2-physicists-sylph-defers-recursivemas-skips-rival/Three AI research wins land today, each leaning on a different uncredited collaborator — human physicists, a promised follow-up, an untested rival system.Wed, 06 May 2026 00:00:00 GMTAI ResearchMeta locks Sapiens2's license, Tuna-2 drops encoders, Apple routes KV layershttps://jacksunwei.me/digest/ai-research/sapiens2-license-tuna2-encoders-apple-kv-routing/https://jacksunwei.me/digest/ai-research/sapiens2-license-tuna2-encoders-apple-kv-routing/Three model releases lead today's research, each defined by an unusual structural choice in licensing, architecture, or memory layout.Tue, 05 May 2026 00:00:00 GMTAI ResearchSessa beats Transformers, CHAI beats Gemini, agent survey regrades Sorahttps://jacksunwei.me/digest/ai-research/sessa-chai-world-models-survey-author-built-benchmarks/https://jacksunwei.me/digest/ai-research/sessa-chai-world-models-survey-author-built-benchmarks/Three research results today each ride a benchmark the authors themselves designed or proposed, from Sessa's synthetic task to CHAI's cinematic rubric.Mon, 04 May 2026 00:00:00 GMTAI ResearchAISI clocks GPT-5.5 jailbroken in 6 hours; Oxford ties warmth to 10-30pt errorshttps://jacksunwei.me/digest/ai-research/aisi-gpt-5-5-jailbroken-oxford-warmth-errors/https://jacksunwei.me/digest/ai-research/aisi-gpt-5-5-jailbroken-oxford-warmth-errors/Two outside evaluations document failure modes vendor benchmarks missed: a six-hour GPT-5.5 jailbreak from AISI and warmth-induced errors from Oxford.Sat, 02 May 2026 00:00:00 GMTAI ResearchDeepMind's wrong test, Church recodes E. coli, Princeton's MoE tradeoffhttps://jacksunwei.me/digest/ai-research/deepmind-wrong-test-church-recodes-ecoli-princeton-moe-tradeoff/https://jacksunwei.me/digest/ai-research/deepmind-wrong-test-church-recodes-ecoli-princeton-moe-tradeoff/Three research results in clinical AI, synthetic biology, and MoE inference each shift meaning once the sibling comparison is read alongside.Fri, 01 May 2026 00:00:00 GMTAI ResearchSycophancy, agent-coded bugs, research agents: outside audits widen each gaphttps://jacksunwei.me/digest/ai-research/sycophancy-agent-coded-bugs-research-agents-outside-audits-widen-gap/https://jacksunwei.me/digest/ai-research/sycophancy-agent-coded-bugs-research-agents-outside-audits-widen-gap/Three behavioral audits land today, and in each one independent measurement makes the vendor-reported problem look bigger, not smaller.Thu, 30 Apr 2026 00:00:00 GMTAI ResearchBio, skills, and judges: three benchmarks debut with the cracks already mappedhttps://jacksunwei.me/digest/ai-research/three-benchmarks-debut-cracks-already-mapped/https://jacksunwei.me/digest/ai-research/three-benchmarks-debut-cracks-already-mapped/Three evaluation benchmarks launched today across bioinformatics, agent skills, and LLM judging — each with reproducibility or methodology caveats baked in.Wed, 29 Apr 2026 00:00:00 GMTAI ResearchAgent benchmarks on trial: gaming, unreliability, and self-graded winshttps://jacksunwei.me/digest/ai-research/agent-benchmarks-on-trial/https://jacksunwei.me/digest/ai-research/agent-benchmarks-on-trial/Research today turns the lens on evaluation itself, with terminal agents gaming verifiers, computer-use wins failing to repeat, and a unifying metric grading its own homework.Tue, 28 Apr 2026 00:00:00 GMTAI ResearchWhen the scaffold outweighs the model: a day of harness-defined resultshttps://jacksunwei.me/digest/ai-research/when-the-scaffold-outweighs-the-model/https://jacksunwei.me/digest/ai-research/when-the-scaffold-outweighs-the-model/Across today's model launches and agent benchmarks, the harness, evaluation rubric, and licensing frame are doing more work than the weights themselves.Mon, 27 Apr 2026 00:00:00 GMTAI ResearchDeepSeek-V4's Ascend pivot: cheaper tokens, shakier answershttps://jacksunwei.me/digest/ai-research/deepseek-v4-ascend-pivot-cheaper-shakier/https://jacksunwei.me/digest/ai-research/deepseek-v4-ascend-pivot-cheaper-shakier/DeepSeek-V4 matches frontier coding scores on Huawei silicon at a fraction of the compute, but hallucinates on almost every prompt.Sun, 26 Apr 2026 00:00:00 GMTAI ResearchDeepSeek-V4 closes the open-weights capability gap — and reopens othershttps://jacksunwei.me/digest/ai-research/deepseek-v4-open-weights-frontier/https://jacksunwei.me/digest/ai-research/deepseek-v4-open-weights-frontier/DeepSeek-V4 matches frontier benchmarks under a true MIT license, but hallucination rates, token burn, and training hardware tell a more complicated story.Sat, 25 Apr 2026 00:00:00 GMTAI ResearchScaffolding, not weights: where AI research is actually moving todayhttps://jacksunwei.me/digest/ai-research/scaffolding-not-weights-where-research-is-moving/https://jacksunwei.me/digest/ai-research/scaffolding-not-weights-where-research-is-moving/Today's most consequential AI research lives in the scaffolding — distributed training, agent harnesses, retrieval indexes — rather than the model weights themselves.Fri, 24 Apr 2026 00:00:00 GMTAI ResearchThe work is migrating outward from the weightshttps://jacksunwei.me/digest/ai-research/work-migrating-outward-from-the-weights/https://jacksunwei.me/digest/ai-research/work-migrating-outward-from-the-weights/A day where the frontier shows up not in new architectures but in the training infrastructure, measurement methods, and agent harnesses around models.Thu, 23 Apr 2026 00:00:00 GMTAI ResearchLong-horizon agents are outrunning their yardstickshttps://jacksunwei.me/digest/ai-research/long-horizon-agents-outrunning-yardsticks/https://jacksunwei.me/digest/ai-research/long-horizon-agents-outrunning-yardsticks/Agent capability is stretching across longer trajectories while the surveys, throughput stats, and safety checks meant to grade them quietly lag behind.Wed, 22 Apr 2026 00:00:00 GMTAI ResearchAgent research moves from leaderboard scores to the trace itselfhttps://jacksunwei.me/digest/ai-research/agent-research-moves-from-scores-to-traces/https://jacksunwei.me/digest/ai-research/agent-research-moves-from-scores-to-traces/Three new agent papers move the conversation from outcome scores to process-level evidence, and the headline numbers look shakier under that lens.Tue, 21 Apr 2026 00:00:00 GMTAI ResearchClean in the lab, brittle in production: a day of disappearing winshttps://jacksunwei.me/digest/ai-research/clean-in-the-lab-brittle-in-production/https://jacksunwei.me/digest/ai-research/clean-in-the-lab-brittle-in-production/Today's AI research keeps producing elegant sandbox results that flatten, leak, or get gamed the moment they meet a production pipeline.Mon, 20 Apr 2026 00:00:00 GMTAI Research