JS Wei (Jack) Sun

AIRA short of Hymba, UI traces ID 14 agents, MANSU holds unlearning at 4-bit

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Agentic Discovery of Neural Architectures: AIRA-Compose and AIRA-Design huggingface.co

AI agents autonomously design foundation models exceeding standard Transformers through dual frameworks that optimize both architectural search and mechanistic implementation, achieving superior performance and efficiency.

Known By Their Actions: Fingerprinting LLM Browser Agents via UI Traces huggingface.co

Website tracking systems can identify the underlying large language model powering web browsing agents with high accuracy through behavioral patterns and timing data.

Forgetting That Sticks: Quantization-Permanent Unlearning via Circuit Attribution huggingface.co

Quantization reverses machine unlearning effects, revealing a fundamental sparsity-permanence tradeoff where parameter updates fall below quantization bin widths, leading to the development of MANSU to achieve both forgetting and retention preservation under compression.

Steered LLM Activations are Non-Surjective huggingface.co

White-box activation steering produces residual-stream states that have no textual preimage, meaning prompt engineering cannot replicate them. The finding draws a hard line between black-box and white-box control, with implications for interpretability and safety research that assumes prompts span the reachable behavior space.

No One Knows the State of the Art in Geospatial Foundation Models huggingface.co

Inconsistent evaluation protocols and unreported pretraining controls make geospatial foundation model comparisons unreliable across disaster response, land-cover mapping, and food-security tasks. The authors release a GFM leaderboard on GitHub to standardize reporting and restore reproducibility in the field.

Physics-R1: An Audited Olympiad Corpus and Recipe for Visual Physics Reasoning huggingface.co

An audit of multimodal physics evaluations using 5-gram Jaccard checks and a Haiku-4.5 judge surfaces train-eval contamination, translation drift, and MCQ saturation across PhysReason, OlympiadBench-Physics, and PhyX. The team releases cleaned corpora and a Qwen3-VL-8B-Thinking model trained with GSPO and DAPO.

Solvita: Enhancing Large Language Models for Competitive Programming via Agentic Evolution huggingface.co

A four-agent framework (Planner, Solver, Oracle, Hacker) updates a graph-structured knowledge network with reinforcement learning, letting code models keep learning between problems. Solvita reports state-of-the-art results on CodeContests, APPS, AetherCode, and Codeforces through certified supervision and targeted hacking.

GQLA: Group-Query Latent Attention for Hardware-Adaptive Large Language Model Decoding huggingface.co

Group-Query Latent Attention exposes multiple decoding paths from a single trained weight set, generalizing MQA, GQA, and MLA. Conversion tools TransMLA and TransGQLA adapt existing checkpoints so the same model runs efficiently on high-end and commodity GPUs without retraining.

Stress-Testing the Reasoning Competence of LLMs With Proofs Under Minimal Formalism huggingface.co

The benchmark uses a minimal natural-deduction language for proof writing, checking, masking, and gap-filling, then scores models with an Epistemic Stability Index and 2PL IRT analyses. The setup measures reasoning depth and stability instead of single-shot accuracy on informal math.

HodgeCover: Higher-Order Topological Coverage Drives Compression of Sparse Mixture-of-Experts huggingface.co

A learning-free method applies Hodge decomposition and harmonic kernels over a simplicial Laplacian to identify redundant experts and merge them without retraining. Triplet barriers bound KL divergence from the original router, enabling aggressive expert reduction in sparse mixture-of-experts models.

Learning POMDP World Models from Observations with Language-Model Priors huggingface.co

Pinductor uses language model priors to efficiently learn POMDP models from limited observation-action data, matching performance of methods with privileged hidden state access while outperforming traditional tabular approaches.

Auditing Agent Harness Safety huggingface.co

LLM agents executing within execution harnesses can produce correct outputs while violating safety constraints during execution, necessitating trajectory-level auditing to ensure proper resource access and information flow across multi-agent systems.

DiagnosticIQ: A Benchmark for LLM-Based Industrial Maintenance Action Recommendation from Symbolic Rules huggingface.co

Large language models struggle to translate industrial monitoring rules into maintenance actions due to brittleness and pattern-matching behaviors, despite achieving high performance on structured benchmarks.

PhysBrain 1.0 Technical Report huggingface.co

PhysBrain 1.0 leverages human egocentric video to generate physical commonsense supervision for vision-language-action models, achieving state-of-the-art performance in embodied control tasks through capability-preserving adaptation.

MobileEgo Anywhere: Open Infrastructure for long horizon egocentric data on commodity hardware huggingface.co

A mobile-based framework for collecting long-duration egocentric robot data using smartphone sensors, enabling large-scale training of vision-language-action models.

Unlocking Dense Metric Depth Estimation in VLMs huggingface.co

DepthVLM enhances Vision-Language Models with dense geometry prediction through a lightweight depth head and unified vision-text supervision, achieving superior 3D spatial reasoning while maintaining multimodal capabilities.

CiteVQA: Benchmarking Evidence Attribution for Trustworthy Document Intelligence huggingface.co

CiteVQA introduces a benchmark for document vision-language models that evaluates both answer accuracy and correct citation of supporting evidence, revealing significant attribution hallucinations in current models.

DexJoCo: A Benchmark and Toolkit for Task-Oriented Dexterous Manipulation on MuJoCo huggingface.co

DexJoCo presents a benchmark and toolkit for dexterous manipulation with 11 functional tasks evaluating tool-use, bimanual coordination, and long-horizon execution, along with a low-cost data collection system and comprehensive model evaluation.

Look Before You Leap: Autonomous Exploration for LLM Agents huggingface.co

Agents trained with standard reinforcement learning exhibit narrow behaviors due to premature exploitation, but systematic exploration training improves adaptability and real-world performance.

MMSkills: Towards Multimodal Skills for General Visual Agents huggingface.co

Multimodal procedural knowledge frameworks enable visual agents to leverage external reusable skills through structured representations combining text, state cards, and visual keyframes, improving decision-making in complex environments.

MLAIRE: Multilingual Language-Aware Information Retrieval Evaluation Protocal huggingface.co

Multilingual information retrieval evaluation protocol MLAIRE separates semantic retrieval accuracy from query-language preference to better assess retrieval utility across mixed-language corpora.

Sparse Autoencoders enable Robust and Interpretable Fine-tuning of CLIP models huggingface.co

SAE-FT enables robust fine-tuning of vision-language models by regularizing visual representations through sparse autoencoder constraints, maintaining performance while improving robustness against distribution shifts.

Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction huggingface.co

Raster2Seq reconstructs floorplan vector graphics from raster images using sequence-to-sequence modeling with autoregressive decoding guided by learnable anchors for spatial attention.

PAGER: Bridging the Semantic-Execution Gap in Point-Precise Geometric GUI Control huggingface.co

Advanced vision-language models for GUI agents face challenges in precision-sensitive tasks requiring point-level accuracy and geometric awareness, addressed by a topology-aware agent that improves task success through structured planning and pixel-level execution.

AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting huggingface.co

AuralSAM2 integrates audio into SAM2 through an AuralFuser module that generates sparse and dense prompts, enhancing cross-modal influence while maintaining interactive segmentation efficiency.

Follow the Mean: Reference-Guided Flow Matching huggingface.co

Flow matching enables controllable generation through example-based adaptation via conditional endpoint mean adjustment, offering training-free and parametric guidance methods for style and content control.

CM-EVS: Sparse Panoramic RGB-D-Pose Data for Complete Scene Coverage huggingface.co

Modern 3D visual learning relies on observations sampled from metric 3D assets, yet existing scans, meshes, point clouds, simulations, and reconstructions do not directly provide a sparse, comparable, and geometry-consistent panoramic training interface. Dense trajectories duplicate nearby views, source-specific rendering policies yield heterogeneous annotations, and sparse heuristics may miss important regions or introduce depth-inconsistent observations. We study how to convert 3D assets into

InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation huggingface.co

InsightTok improves discrete visual tokenization for better text and face reconstruction through content-aware perceptual losses, enhancing autoregressive image generation quality.

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation huggingface.co

On-policy distillation efficiency arises from early establishment of stable update trajectories, with findings leading to a plug-and-play acceleration method achieving 3x training speedup.

Learning from Failures: Correction-Oriented Policy Optimization with Verifiable Rewards huggingface.co

Correction-Oriented Policy Optimization extends reinforcement learning with verifiable rewards by converting failed trajectories into correction supervision, improving reasoning capabilities and error correction in language models.

Hölder Policy Optimisation huggingface.co

Group Relative Policy Optimisation (GRPO) enhances large language models by estimating advantages across a group of sampled trajectories. However, mapping these trajectory-level advantages to policy updates requires aggregating token-level probabilities within each sequence. Relying on a fixed aggregation mechanism for this step fundamentally limits the algorithm’s adaptability. Empirically, we observe a critical trade-off: certain fixed aggregations frequently suffer from training collapse, whi

MetaAgent-X : Breaking the Ceiling of Automatic Multi-Agent Systems via End-to-End Reinforcement Learning huggingface.co

MetaAgent-X presents an end-to-end reinforcement learning framework that jointly optimizes automatic multi-agent system design and execution through hierarchical rollout and stagewise co-evolution techniques.

Nudging Beyond the Comfort Zone: Efficient Strategy-Guided Exploration for RLVR huggingface.co

NudgeRL framework enhances reinforcement learning with verifiable rewards through structured exploration and strategy nudging to improve reasoning capabilities in large language models.

Distilling Long-CoT Reasoning through Collaborative Step-wise Multi-Teacher Decoding huggingface.co

CoRD is a collaborative multi-teacher decoding framework that synthesizes reasoning trajectories through predictive perplexity scoring and beam search, enabling efficient distillation of large reasoning models with high-quality outputs and generalized performance.

Flash-GRPO: Efficient Alignment for Video Diffusion via One-Step Policy Optimization huggingface.co

Flash-GRPO improves training efficiency for video diffusion models by addressing temporal variance and gradient inconsistency through iso-temporal grouping and temporal gradient rectification.

ChangeFlow — Latent Rectified Flow for Change Detection in Remote Sensing huggingface.co

ChangeFlow presents a generative framework for remote sensing change detection that synthesizes change masks in latent space using rectified flow, achieving improved accuracy and robustness through sampling-based prediction ensembling.

Efficient Image Synthesis with Sphere Latent Encoder huggingface.co

A decoupled framework for few-step image generation that improves efficiency and performance by separating pixel-space operations from latent denoising training.

OmniHumanoid: Streaming Cross-Embodiment Video Generation with Paired-Free Adaptation huggingface.co

OmniHumanoid enables cross-embodiment video generation by factorizing motion transfer and embodiment-specific adaptation, allowing scalable adaptation to new humanoid embodiments using unpaired data.

FFAvatar: Few-Shot, Feed-Forward, and Generalizable Avatar Reconstruction huggingface.co

FFAvatar enables fast, high-quality 3D head avatar reconstruction from few unposed images using a feed-forward approach with multi-view fusion and end-to-end FLAME parameter prediction.

WorldAct: Activating Monolithic 3D Worlds into Interactive-Ready Object-Centric Scenes huggingface.co

WorldAct framework transforms static 3D generated environments into editable and interactive scenes through multimodal agents and geometric reconstruction techniques.

ReactiveGWM: Steering NPC in Reactive Game World Models huggingface.co

ReactiveGWM enables dynamic player-NPC interactions in game worlds by decoupling player controls from NPC behaviors through diffusion models with cross-attention modules for game-agnostic strategy transfer.

FashionChameleon: Towards Real-Time and Interactive Human-Garment Video Customization huggingface.co

FashionChameleon enables real-time interactive multi-garment video customization through teacher-student distillation and in-context learning techniques while maintaining motion coherence.

From Plans to Pixels: Learning to Plan and Orchestrate for Open-Ended Image Editing huggingface.co

An experiential framework for long-horizon image editing that couples planning with reward-driven execution to improve coherence and reliability of complex multi-step edits.

References

facebookresearch/aira-dojo (GitHub) github.com

AIRA-dojo abstracts agents into Solvers, Operators, and Search Policies, and uses Apptainer containers to provide isolated code execution environments… supporting integration with Slurm to run over 1,000 agents in parallel.

TheMoonlight independent review themoonlight.io

Agents currently excel at ‘engineering synthesis’—combining existing concepts like linear attention and SwiGLU—rather than proposing fundamentally new mathematical primitives; the proxy-target gap remains a persistent challenge and performance can be uneven across different agent seeds.

ByteIota — ‘AI Scientist v2 Passes Peer Review but 57% is False Data’ byteiota.com

An audit revealed that 57% of [Sakana’s AI Scientist v2] generated papers contained hallucinated numerical results or fabricated data… approximately 42% of automated experiments fail due to coding errors, and agents have been observed ‘reward-hacking’.

DeepLearning.AI — The Batch on benchmark contamination deeplearning.ai

Many top-performing agents on leaderboards achieve high scores by exploiting shared environments to read hidden answer keys… roughly one-third of SWE-bench issues contain solutions within their own comments.

CodeSOTA hybrid-architecture benchmarks codesota.com

Hymba-1.5B outperforms Llama 3.2 3B in average accuracy while achieving a 11.67× reduction in KV cache size and a 3.49× increase in throughput… Zebra-Llama-1B maintains higher accuracy than its Llama 3.2 counterpart while using only ~8% of the equivalent KV cache memory.

arXiv 2602.06855 — prior LLM-NAS work (EvoPrompting / GPT-NAS lineage) arxiv.org

Previous methods typically use LLMs as ‘variation operators’—performing mutation and crossover within a standard Genetic Algorithm loop… AIRA-Compose instead employs agents that formulate structural hypotheses, write implementation code, and iteratively debug their designs.

Moonlight review of AgentPrint (Exposing LLM User Privacy via Traffic Fingerprint Analysis) themoonlight.io

By analyzing packet sizes, transmission timing, and directionality—rather than content—researchers achieved a macro F1-score of 0.866 in identifying specific agents among a pool of 50 GPT-based services… adversaries can infer sensitive user attributes, such as professional roles, with up to 73.9% top-3 accuracy.

browser-use.com — ‘Speed Matters’ benchmark post browser-use.com

A test suite that took 1.9 seconds in raw Playwright required 45.8 seconds in Midscene.js due to repeated VLM inference calls.

r/LLMDevs — practitioner test of 6 models on real browser automation reddit.com

Vision-only models often struggle with progressive disclosure (elements hidden behind menus or dropdowns) because they cannot ‘see’ what isn’t rendered, whereas DOM-based agents can query the underlying structure to find hidden nodes.

CyBiasBench (ResearchGate) researchgate.net

Agents possess a persistent ‘attack-selection bias,’ favoring specific exploit families regardless of the prompt. This creates a stable behavioral signature that defenders use for incident attribution and model-specific jailbreak targeting.

Google Cloud Threat Intelligence — AI vulnerability exploitation / initial access cloud.google.com

Modern LLM reconnaissance, formalized in the MITRE ATLAS framework, involves ‘Active Scanning’ where attackers probe an agent’s service portals to map its available tools and internal prompts… allowing adversaries to understand the system persona and available API tools, effectively creating a blueprint for further manipulation.

arxivdaily independent digest of the paper arxivdaily.com

Classifiers can identify a model’s identity with up to 96% F1 accuracy using fewer than 15 observed events or by analyzing only the first 40% of a session… results indicate that such protections are not robust; classifiers retrained on delayed traces can still recover peak identification performance.

Mechanistic Interpretability as Statistical Estimation: A Variance Analysis of EAP-IG (ResearchGate) researchgate.net

the specific edges identified in a circuit can change significantly if the input data is resampled or if prompts are slightly paraphrased… a circuit discovered in one experimental run may be an artifact of the specific sample rather than a reflection of a stable, underlying model mechanism

arxiv 2410.07163 (relearning attacks on WMDP unlearning) arxiv.org

as few as 5 to 15 samples of the ‘forgotten’ data can almost completely reverse the effects of unlearning… training on entirely unrelated, benign data can lead to the recovery of the original hazardous knowledge

themoonlight.io review of SimNPO themoonlight.io

SimNPO uses length-normalized log-probabilities… more effectively allocates optimization effort, leading to higher Forget Quality (FQ) while maintaining superior performance on the retain set

IJCAI 2025 — Machine Unlearning via Null Space Calibration (UNSC) ijcai.org

confines the unlearning updates to [a null space tailored to the retain set]… ensuring that the unlearning update Δθ satisfies Aθ = 0… predictions on the remaining dataset remain mathematically invariant

arxiv 2602.13151 — survey of quantization-reversal in LLM unlearning arxiv.org

In full-precision (FP16/BF16) models retain only about 21% of forbidden knowledge; after 4-bit quantization the retention of forgotten knowledge spikes to approximately 83%… even advanced quantization methods like GPTQ and AWQ… fail to prevent this knowledge recovery

OpenReview — ‘Erasure or Erosion?’ critique of structural-erasure methods openreview.net

aggressive erasure methods often cause ‘manifold collapse,’ where removing a specific concept unintentionally degrades the model’s broader compositional integrity… a safety-utility trade-off where models lose the ability to perform complex attribute binding or spatial reasoning on unrelated safe prompts

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare