JS Wei (Jack) Sun

Anthropic Institute opens, ElementsClaw verifies old hits, DXRG cuts 57% to 3%

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Focus areas for The Anthropic Institute anthropic.com

Agentic Fusion of Large Atomic and Language Models to Accelerate Superconductors Discovery huggingface.co

ElementsClaw integrates Large Atomic Models and Large Language Models to autonomously orchestrate materials discovery, identifying thousands of high-confidence superconducting candidates and experimentally verifying novel superconductors.

Operating-Layer Controls for Onchain Language-Model Agents Under Real Capital huggingface.co

Autonomous language-model agents managing real cryptocurrency trades demonstrated high reliability through comprehensive system design encompassing prompt compilation, policy validation, and execution safeguards rather than relying solely on base model performance.

GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents huggingface.co

Z.ai’s GLM-5V-Turbo bakes multimodal perception directly into the reasoning loop of an agent foundation model, posting strong scores on multimodal coding and visual tool-use benchmarks while preserving text-only capabilities. Code is on GitHub under zai-org/GLM-V.

Unified 4D World Action Modeling from Video Priors with Asynchronous Denoising huggingface.co

X-WAM unifies real-time robotic action execution with high-fidelity 4D world synthesis in a single Diffusion Transformer, adding a depth prediction branch over RGB-D video and an asynchronous denoising schedule that decouples action and scene generation timesteps for efficiency.

Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding huggingface.co

Applying speculative decoding inside RL post-training rollouts, the system pairs vLLM with Eagle3-style draft models and MTP heads while preserving the target output distribution, with a performance simulator projecting roughly 2.5x end-to-end training speedup at large rollout scales.

Turning the TIDE: Cross-Architecture Distillation for Diffusion Large Language Models huggingface.co

TIDE distills knowledge from autoregressive teachers into diffusion LLMs across architectures and tokenizers, introducing TIDAL for noise-dependent reliability weighting, CompDemo for complementary mask splitting, and a Reverse CALM objective that performs chunk-level likelihood matching across vocabularies.

Large Language Models Explore by Latent Distilling huggingface.co

Exploratory Sampling trains a lightweight distiller to predict an LLM’s deep-layer hidden representations from shallow ones, then uses the prediction gap as a novelty signal to bias decoding toward semantically diverse continuations without retraining the base model.

Probing Visual Planning in Image Editing Models huggingface.co

The AMAZE dataset reframes visual planning as single-step image-to-image transformation over abstract maze puzzles, finding that both autoregressive and diffusion editors struggle zero-shot and remain well below human efficiency even after fine-tuning.

RADIO-ViPE: Online Tightly Coupled Multi-Modal Fusion for Open-Vocabulary Semantic SLAM in Dynamic Environments huggingface.co

RADIO-ViPE performs open-vocabulary semantic SLAM from raw monocular RGB video alone, skipping camera intrinsics and depth sensors by fusing agglomerative foundation-model embeddings into the factor graph with adaptive robust kernels to handle dynamic scenes.

ClawGym: A Scalable Framework for Building Effective Claw Agents huggingface.co

ClawGym presents a scalable framework for developing Claw-style personal agents with synthetic training data, verified workspaces, and benchmark evaluation.

FAMA: Failure-Aware Meta-Agentic Framework for Open-Source LLMs in Interactive Tool Use Environments huggingface.co

Failure-Aware Meta-Agentic framework improves open-source LLM performance in conversational scenarios by identifying common errors and deploying specialized agents to correct them.

Diffusion Templates: A Unified Plugin Framework for Controllable Diffusion huggingface.co

Diffusion Templates presents a unified framework that decouples base-model inference from controllable capabilities, enabling modular and composable control methods across various diffusion model applications.

Praxy Voice: Voice-Prompt Recovery + BUPS for Commercial-Class Indic TTS from a Frozen Non-Indic Base at Zero Commercial-Training-Data Cost huggingface.co

Researchers enhance a non-Indic-native text-to-speech system by implementing a Brahmic Unified Phoneme Space, LoRA adaptation, and voice-prompt recovery techniques to achieve commercial-quality output for Indic languages without requiring new acoustic decoders or commercial training data.

PSP: An Interpretable Per-Dimension Accent Benchmark for Indic Text-to-Speech huggingface.co

A new benchmark called PSP measures accent in Indic languages through six phonological dimensions, revealing inconsistencies between standard evaluation metrics and actual accent fidelity.

A Survey on LLM-based Conversational User Simulation huggingface.co

Large language models have significantly advanced conversational user simulation by enabling high-fidelity synthetic conversation generation, which is systematically surveyed and categorized under a novel taxonomy of user granularity and simulation objectives.

Sample Selection Using Multi-Task Autoencoders in Federated Learning with Non-IID Data huggingface.co

Federated learning sample selection methods using multitask autoencoders, outlier detection techniques, and deep support vector data description enhance model accuracy under non-IID and noisy conditions.

Enhanced Privacy and Communication Efficiency in Non-IID Federated Learning with Adaptive Quantization and Differential Privacy huggingface.co

Adaptive quantization combined with differential privacy reduces communication overhead in federated learning while maintaining model accuracy and privacy guarantees.

FASH-iCNN: Making Editorial Fashion Identity Inspectable Through Multimodal CNN Probing huggingface.co

FASH-iCNN is a multimodal system that identifies fashion house, era, and color tradition from garment photographs with high accuracy, revealing that texture and luminance are primary carriers of editorial identity.

References

KESQ/CNN Business kesq.com

Anthropic ditches its core safety promise in the middle of an AI red-line fight with the Pentagon… reclassifying the pause commitment as a ‘flexible goal’ rather than a hard commitment.

PureAI pureai.com

Co-founder Jack Clark transitioned from Head of Policy to the newly created role of Head of Public Benefit, while Sarah Heck took over Public Policy; the Institute centralizes the Frontier Red Team, Societal Impacts and Economic Research under one roof.

Forbes (Hamilton Mann) forbes.com

Anthropic’s study does not measure AI’s labor market impacts… using platform traces from a single provider reflects Anthropic’s user demographics rather than the broader workforce, and treats jobs as a sum of automate-able tasks rather than outcomes and judgment.

CyberScoop cyberscoop.com

Claude Mythos identified thousands of zero-day vulnerabilities including a 27-year-old remote crash bug in OpenBSD; Anthropic committed $100M in usage credits and $4M in donations to open-source security orgs to facilitate patching.

Platformer platformer.news

Once comparable capabilities reach open-source or ransomware actors, they will be able to weaponize bugs at machine speed… AI-driven discovery creates an exponential remediation problem the human patch pipeline cannot absorb.

EA Forum — ‘I read every major AI lab’s safety plan’ forum-bots.effectivealtruism.org

Anthropic emphasizes internal behavioral constraints and RSPs while OpenAI follows an iterative ‘ship and govern’ model and DeepMind’s Frontier Safety Framework prioritizes the science of safety; GovAI alumni populate the policy heads of all three.

Chemistry World — Palgrave & Schoop reanalysis of Berkeley A-Lab chemistryworld.com

the A-Lab had not synthesized any truly novel materials… systematic errors in the AI’s interpretation of XRD data… failed to account for compositional disorder and misidentified mixtures of known compounds as new materials

PMC / Cheetham & Seshadri critique of GNoME pmc.ncbi.nlm.nih.gov

many are mere ‘hallucinations’ of the algorithm… a large fraction of these candidates are simple derivatives of known crystals or lack the novelty, credibility, and utility required to be termed ‘materials’

Google Research blog — Expert Evaluation of LLM World Models (High-Tc case study) research.google

all models exhibited severe limitations… a primary failure was the models’ total incapacity to engage with data visualization… LLMs frequently conflate speculative claims with scientific consensus

ResearchGate — High-Entropy Alloy Superconductors on an α-Mn lattice researchgate.net

Re6Zr and Re6Hf were identified as noncentrosymmetric superconductors with Tc values typically below 7 K… muon-spin rotation experiments have revealed spontaneous magnetic fields below Tc in Re-based alloys like Re6(Zr,Hf)

AIExpert News — independent coverage of ElementsClaw aiexpert.news

screened approximately 2.4 million stable crystals in just 28 GPU hours… all four new superconductors exhibit transition temperatures below 7 K, far from the room-temperature grail

alphaXiv overview of 2604.23758 alphaxiv.org

while the system plans and executes stages autonomously, it still relies on human oversight and prompting to harmonize the process and ensure physical fidelity, suggesting it is an assistant rather than a fully independent scientist

Agent Harness for LLM Agents: A Survey (ResearchGate, Apr 2026) researchgate.net

A significant portion of reported agent failures are actually ‘harness failures’ caused by poorly specified environments rather than model limitations… the bottleneck for production-grade agents is no longer raw intelligence but the infrastructure that governs its execution.

MEMEbench (terminal.markets) memebench.terminal.markets

$ANT saw an 84% higher selection rate than $MOON despite identical fundamentals… In 98% of 18,560 inference calls the LLMs justified purchases by citing technical indicators, even though the underlying data was identical to less-preferred tickers.

KuCoin: AI Trading Agent Vulnerability 2026 kucoin.com

Attackers targeted the long-term memory of agents, poisoning decision-making data that spread downstream to connected agents within hours… Step Finance AI agents with excessive permissions amplified a treasury drain to $40 million.

The Register on CMU/Salesforce study theregister.com

Agents using natural language for multi-step tasks achieved a success rate of only 30-35%, [while] those operating within structured ‘Workflow Execution’ frameworks reached success rates higher than 83%.

The Block on Freysa exploit theblock.co

A human player (‘p0pular.eth’) bypassed its logic by tricking the agent into misinterpreting its own approveTransfer function as a method for receiving funds rather than sending them.

AIFinHub: 5 Failure Modes of LLM Trading Agents aifinhub.io

Five recurring failure modes in 2026: prompt drift (where model updates degrade strategy), silent numeric fabrication, price-blind contamination, token-cost runaway, and ‘audit amnesia’ — the inability of an agent to explain its own trade history.

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare