JS Wei (Jack) Sun

The work is migrating outward from the weights

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Decoupled DiLoCo: A new frontier for resilient, distributed AI training deepmind.google

What 81,000 people told us about the economics of AI anthropic.com

Announcing the Anthropic Economic Index Survey anthropic.com

Sema Code: Decoupling AI Coding Agents into Programmable, Embeddable Infrastructure huggingface.co

Sema Code presents an open AI coding framework that decouples the core agent engine from client interfaces, enabling shared reasoning capabilities across diverse development environments through a standalone npm library and modular architecture.

SemaClaw: A Step Towards General-Purpose Personal AI Agents through Harness Engineering huggingface.co

OpenClaw’s emergence in 2026 signifies a shift toward scalable personal AI agents requiring robust infrastructure for control and trustworthiness, addressed by SemaClaw’s multi-agent framework with novel orchestration, safety, and context management components.

From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space huggingface.co

PreRL shifts reinforcement learning from the conditional P(y|x) to the marginal P(y) in pre-train space, applying reward-driven online updates to expand reasoning horizons before standard RL fine-tuning. A companion DSRL variant uses negative sample reinforcement to seed policy reincarnation.

Target Policy Optimization huggingface.co

Target Policy Optimization decouples which actions to reinforce from how probability mass gets assigned, replacing the policy-gradient objective with cross-entropy matching to a target distribution. The authors report gains over standard policy gradients on tabular bandits, transformer sequence tasks, and LLM RLVR with sparse rewards.

LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling huggingface.co

LangFlow shows continuous diffusion can match discrete diffusion language models by running Flow Matching in embedding space, paired with Bregman-divergence training, an ODE-based NLL bound, Gumbel-noise scheduling and self-conditioning. Reported perplexity rivals autoregressive baselines, narrowing the long-standing continuous-vs-discrete gap.

What do Language Models Learn and When? The Implicit Curriculum Hypothesis huggingface.co

The Implicit Curriculum Hypothesis argues pretraining capabilities emerge in a consistent compositional order across architectures, with emergence points predictable from internal function-vector representations. The authors track training trajectories on elemental tasks to show when specific skills appear rather than just how scaling improves loss.

Memory Transfer Learning: How Memories are Transferred Across Domains in Coding Agents huggingface.co

Memory Transfer Learning lets coding agents share a unified memory pool across domains, finding that transferring high-level meta-knowledge and validation routines outperforms reusing low-level code traces, which tend to cause negative transfer between unrelated codebases.

Self-Distillation Zero: Self-Revision Turns Binary Rewards into Dense Supervision huggingface.co

Self-Distillation Zero converts sparse binary RL rewards into dense token-level supervision by training a model in dual teacher-student roles, using on-policy self-distillation and token-level self-localization. The authors report stronger reasoning performance with fewer rollouts than standard RLVR.

Seedance 2.0: Advancing Video Generation for World Complexity huggingface.co

ByteDance’s Seedance 2.0 is a unified audio-video generation model accepting text, image, audio and video inputs for joint generation, content reference and editing within a single architecture. The release emphasizes higher quality and lower-latency inference for real-time scenarios, drawing 153 upvotes on Hugging Face.

OccuBench: Evaluating AI Agents on Real-World Professional Tasks via Language World Models huggingface.co

OccuBench presents a comprehensive benchmark for evaluating AI agents across 100 professional domains using Language World Models to simulate real-world environments with controlled fault injection.

TIP: Token Importance in On-Policy Distillation huggingface.co

On-policy knowledge distillation token selection methods are improved by identifying informative tokens through student entropy and teacher-student divergence, enabling efficient training with reduced memory usage.

Do AI Coding Agents Log Like Humans? An Empirical Study huggingface.co

Software logging is essential for maintaining and debugging complex systems, yet it remains unclear how AI coding agents handle this non-functional requirement. While prior work characterizes human logging practices, the behaviors of AI coding agents and the efficacy of natural language instructions in governing them are unexplored. To address this gap, we conduct an empirical study of 4,550 agentic pull requests across 81 open-source repositories. We compare agent logging patterns against human

GameWorld: Towards Standardized and Verifiable Evaluation of Multimodal Game Agents huggingface.co

GameWorld presents a standardized benchmark for evaluating multimodal large language model agents in video games, featuring diverse games and verified metrics for comprehensive assessment.

RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time huggingface.co

Training reward models to generate multi-dimensional critiques improves visual generation through both enhanced reinforcement learning rewards and test-time refinement loops, achieving state-of-the-art performance with reduced training data requirements.

Exploration and Exploitation Errors Are Measurable for Language Model Agents huggingface.co

Controllable environments with programmable exploration-exploitation balance are designed to evaluate language model agents’ performance on embodied AI tasks, revealing distinct failure modes and demonstrating that reasoning models outperform other approaches.

Geometric Context Transformer for Streaming 3D Reconstruction huggingface.co

LingBot-Map is a feed-forward 3D foundation model that reconstructs scenes from video streams using a geometric context transformer architecture with specialized attention mechanisms for coordinate grounding, dense geometric cues, and long-range drift correction, achieving stable real-time performance at 20 FPS.

MERRIN: A Benchmark for Multimodal Evidence Retrieval and Reasoning in Noisy Web Environments huggingface.co

MERRIN is a human-annotated benchmark for evaluating search-augmented agents in multimodal, noisy web environments, demonstrating significant challenges in retrieving and reasoning over diverse evidence types.

UI-Copilot: Advancing Long-Horizon GUI Automation via Tool-Integrated Policy Optimization huggingface.co

UI-Copilot is a collaborative framework that enhances GUI agents by decoupling memory management and integrating on-demand tool assistance for improved performance in complex user interface tasks.

Mobile GUI Agents under Real-world Threats: Are We There Yet? huggingface.co

Mobile GUI agents powered by large language models show significant performance degradation when exposed to real-world third-party content in commercial applications.

InfiniteScienceGym: An Unbounded, Procedurally-Generated Benchmark for Scientific Analysis huggingface.co

InfiniteScienceGym presents a procedurally generated benchmark for evaluating scientific reasoning in language models, addressing limitations of traditional benchmarks through deterministic repository simulation and verifiable question-answering tasks.

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments huggingface.co

SpatialEvo is a self-evolving framework for 3D spatial reasoning that uses deterministic geometric environments to provide objective feedback, enabling efficient training without relying on model consensus.

Free Geometry: Refining 3D Reconstruction from Longer Versions of Itself huggingface.co

Free Geometry enables feed-forward 3D reconstruction models to self-evolve at test time through self-supervised cross-view feature consistency, improving reconstruction accuracy with lightweight LoRA updates.

TREX: Automating LLM Fine-tuning via Agent-Driven Tree-based Exploration huggingface.co

A multi-agent system automates the complete lifecycle of large language model training by coordinating research and execution modules through iterative planning and experimentation.

ROSE: An Intent-Centered Evaluation Metric for NL2SQL huggingface.co

ROSE is an intent-centered metric for NL2SQL evaluation that uses a Prover-Refuter cascade to assess semantic correctness independently of ground-truth SQL, showing superior agreement with human experts.

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding huggingface.co

UI-Zoomer is a training-free adaptive zoom-in framework for GUI grounding that improves localization accuracy by selectively triggering zoom-in based on prediction uncertainty quantification.

SkVM: Compiling Skills for Efficient Execution Everywhere huggingface.co

SkVM is a compilation and runtime system that enables portable and efficient execution of LLM skills across different models and platforms by treating skills as code and analyzing capability requirements.

ReconPhys: Reconstruct Appearance and Physical Attributes from Single Video huggingface.co

ReconPhys presents a feedforward framework that jointly learns physical attribute estimation and 3D Gaussian Splatting reconstruction from monocular video, achieving faster inference and better reconstruction quality than existing optimization-based methods.

ROSE: Retrieval-Oriented Segmentation Enhancement huggingface.co

A new segmentation task focusing on novel and emerging entities is introduced along with a retrieval-augmented framework that enhances multimodal language models with real-time information and visual prompts.

Anthropogenic Regional Adaptation in Multimodal Vision-Language Model huggingface.co

Vision-language models can be adapted for regional contexts through Anthropogenic Regional Adaptation and GG-EZ method while maintaining global performance and improving cultural relevance.

HDR Video Generation via Latent Alignment with Logarithmic Encoding huggingface.co

High dynamic range video generation is achieved by leveraging pretrained models through logarithmic encoding alignment and camera-mimicking degradation training, enabling effective HDR synthesis without architectural redesign.

Narrative-Driven Paper-to-Slide Generation via ArcDeck huggingface.co

ArcDeck is a multi-agent framework that enhances paper-to-slide generation by modeling logical flow through discourse trees and iterative agent refinement, outperforming direct summarization methods.

Self-Sovereign Agent huggingface.co

Self-sovereign agents represent AI systems capable of autonomously sustaining their operations, requiring further technical development and facing significant security and governance considerations.

A Temporally Augmented Graph Attention Network for Affordance Classification huggingface.co

EEG-tGAT enhances Graph Attention Networks by incorporating temporal attention and dropout mechanisms to improve affordance classification from interaction sequences.

References

ArxivIQ Substack analysis of Decoupled DiLoCo paper arxiviq.substack.com

In high-failure environments, Decoupled DiLoCo maintained 86-88% goodput, whereas traditional elastic data-parallel methods saw performance plummet to 40%… reducing inter-datacenter bandwidth requirements from a theoretical 198 Gbps to just 0.84 Gbps.

Galaxy.com decentralized training research galaxy.com

Prime Intellect’s INTELLECT-1 trained a 10B parameter model on 1 trillion tokens across up to 112 H100 GPUs distributed across three continents, maintaining 83-96% compute utilization… Nous Research’s DisTrO claims a 1,000x to 10,000x reduction in communication.

MarkTechPost coverage citing Jeff Dean marktechpost.com

Chief Scientist Jeff Dean noted that Decoupled DiLoCo finally realizes a vision for fault-tolerant, large-scale deep networks first proposed in his own research 14 years ago… lead author Arthur Douillard described the release as the ‘next frontier for resilient AI pre-training’.

EPFL SACS lab technical review epfl.ch

A primary limitation is the memory bottleneck: each participating node must store a full replica of the model, which restricts participation to devices with high VRAM… existing convergence proofs assume simpler optimizers like SGD or rely on convex assumptions.

Reddit r/machinelearningnews discussion reddit.com

Some dissenters questioned the feasibility for ‘indie’ labs, pointing out that while the bandwidth requirements are lower, the periodic transfer of massive weights (e.g., 50GB+) still presents a significant cost and engineering challenge for non-enterprise users.

Data Center Knowledge — power as AI’s defining limit datacenterknowledge.com

Approximately 40% of AI data centers are currently restricted by power shortages… model training now exceeds the power and space of any single site, forcing a shift from centralized megaclusters to a distributed model where training is spread across multiple geographic regions to tap into disparate energy pools.

Financial Express financialexpress.com

Computer and mathematical tasks accounted for 37% of the recorded conversations, despite these roles representing only 3.4% of the actual U.S. workforce.

Cascade Insights (market researcher review of Anthropic Interviewer) cascadeinsights.com

The chat-only format lacks audio-visual cues like tone and facial expressions, which are vital for deep qualitative probing… effective for ‘mid-depth’ insights at scale, it may not yet replace the 60-minute in-depth human interview.

YouTube commentary citing Daron Acemoglu youtube.com

Acemoglu characterizes Amodei’s predictions—that AI could eliminate 50% of entry-level office roles—as ‘motivated reasoning’… his own research estimates AI will automate approximately 5% of all tasks over the next decade.

SoftwareSeni analysis citing Stanford/ADP softwareseni.com

Stanford/ADP study showing a 16% relative employment decline for entry-level workers in highly exposed occupations, even as senior roles (ages 35–49) saw growth of 6–9%.

Empiricrafting Substack empiricrafting.substack.com

Using Claude to classify the sentiment of Claude users creates a ‘recursive bias’… ‘fact-check tax’—the time required to verify AI-generated output—is often ignored in these self-reported speedup metrics.

Substack analysis of HuggingFace release substack.com

Released via the Anthropic/EconomicIndex repository on HuggingFace under a CC-BY license… raw transcripts of the 81,000 interviews remain restricted to preserve privacy, with only de-identified responses from opt-in users included in the public corpus.

teamday.ai on Mitchell Hashimoto teamday.ai

Anytime you find an agent makes a mistake, you take the time to engineer a solution such that the agent never makes that mistake again

frontman.sh – Best Open Source AI Coding Tools 2026 frontman.sh

Cline introduced Cline Core, a standalone gRPC-based engine that allows the agent’s logic to run independently of VS Code… enabling multiple frontends — terminal CLI, web interface, or JetBrains plugin — to connect to a single persistent task session

superagent.sh – Cline incident: broken security model superagent.sh

A compromised npm publish token allowed an attacker to insert a malicious postinstall script, highlighting a broken security model where agents might autonomously run commands or install dependencies without skeptical oversight

Hugging Face papers (SemaClaw eval summary) huggingface.co

On the GAIA benchmark, the framework improved accuracy from a baseline of 52.3% to 66.0%… an overall task success rate increase from 50% to 80%, reaching parity with sophisticated closed-source systems like Marble

nxcode.io – What is Harness Engineering (2026 guide) nxcode.io

The term harness engineering was formally introduced in February 2026 by Mitchell Hashimoto and further popularized by OpenAI’s Ryan Lopopolo… Agent = Model + Harness

Midea AI GitHub org / Midea industrial-AI strategy coverage github.com

Midea’s software strategy is a subset of its broader China enterprise strategy, which involves a 60-billion-yuan ($8.7 billion) investment in AI and robotics through 2029… the world’s first certified AI Agent Factory in Jingzhou coordinates 14 specialized agents across 38 business scenarios

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare