Single neuron jailbreaks LLMs, MIT ELF cuts data 10×, Qwen-Image hits 4 steps
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
A Single Neuron Is Sufficient to Bypass Safety Alignment in Large Language Models huggingface.co
Safety alignment in language models relies on distinct neural mechanisms where specific neurons control harmful knowledge expression and refusal behavior, with individual neurons being sufficient to bypass safety measures.
Qwen-Image-2.0 Technical Report huggingface.co
Qwen-Image-2.0 is an advanced image generation model that combines high-fidelity synthesis with precise editing capabilities through a unified framework using Qwen3-VL as condition encoder and Multimodal Diffusion Transformer for joint modeling.
ELF: Embedded Language Flows huggingface.co
Continuous diffusion models for language processing can achieve superior generation quality by operating in embedding space rather than discrete token space, enabling efficient sampling and adaptation of image-domain techniques.
Soohak: A Mathematician-Curated Benchmark for Evaluating Research-level Math Capabilities of LLMs huggingface.co
Mathematicians built Soohak, a 439-problem benchmark targeting research-level math beyond olympiad fare. Frontier models stumble most on a refusal subset of ill-posed problems, where the correct move is to reject the question rather than fabricate a proof.
Memory-Efficient Looped Transformer: Decoupling Compute from Memory in Looped Language Models huggingface.co
Looped language models normally pay memory for every reasoning iteration. MELT decouples depth from memory by reusing a single KV cache across loops, adding a learnable gate, chunk-wise training with interpolated transitions, and attention-aligned distillation from fine-tuned teachers.
Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control huggingface.co
RL post-training of LLMs often saturates as entropy collapses. Entrocraft uses rejection sampling to shape a custom entropy schedule, staying advantage-estimator-agnostic while preserving output diversity, extending useful training, and improving generalization over standard policy-gradient baselines.
DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices huggingface.co
DECO targets end-side deployment with a sparse Mixture-of-Experts design that matches dense Transformer quality at lower compute and storage. It combines ReLU-based routing, learnable expert-wise scaling, NormSiLU-gated MLP experts, and a custom acceleration kernel exploiting intrinsic activation sparsity.
SlimQwen: Exploring the Pruning and Distillation in Large MoE Model Pre-training huggingface.co
Structured pruning plus knowledge distillation scales to mixture-of-experts pretraining in SlimQwen. Progressive pruning schedules, partial-preservation expert merging, and multi-token prediction distillation combine to shrink large MoE models while recovering performance through continued training.
Can Muon Fine-tune Adam-Pretrained Models? huggingface.co
Switching optimizers from Adam to Muon at fine-tuning time hurts performance because the two carry different implicit biases, driving catastrophic forgetting. Parameter-efficient methods like LoRA constrain the update enough to close the gap and let Muon work on Adam-trained checkpoints.
Unmasking On-Policy Distillation: Where It Helps, Where It Hurts, and Why huggingface.co
On-policy distillation helps some reasoning tasks and hurts others. A training-free framework scores per-token gradient alignment between student and teacher, yielding a targeted-rollout algorithm that selects which teacher and which contexts actually move the student toward an ideal gradient.
WorldReasonBench: Human-Aligned Stress Testing of Video Generators as Future World-State Predictors huggingface.co
WorldReasonBench and WorldRewardBench are introduced as benchmarks for evaluating video generation models’ ability to reason about world-state evolution, revealing a gap between visual plausibility and true world reasoning capabilities.
Key-Value Means huggingface.co
Key-Value Means introduces a novel attention mechanism that combines transformer and RNN advantages with controllable computational complexity and memory usage.
Model Merging Scaling Laws in Large Language Models huggingface.co
Empirical scaling laws for language model merging reveal power-law relationships between model size, expert count, and cross-entropy performance, enabling predictive planning for optimal model composition.
G-Zero: Self-Play for Open-Ended Generation from Zero Data huggingface.co
A novel verifier-free framework enables autonomous large language model self-improvement through co-evolutionary training with intrinsic rewards and hint-based guidance.
Make Each Token Count: Towards Improving Long-Context Performance with KV Cache Eviction huggingface.co
Learned global retention-based key-value cache eviction improves long-context reasoning by selectively retaining useful tokens while reducing memory usage.
SlimSpec: Low-Rank Draft LM-Head for Accelerated Speculative Decoding huggingface.co
SlimSpec improves speculative decoding efficiency by using low-rank parameterization to compress the drafter’s language model head while maintaining full vocabulary support and achieving significant speedup with minimal pipeline changes.
Geometry Conflict: Explaining and Controlling Forgetting in LLM Continual Post-Training huggingface.co
Research investigates how task geometry influences continual post-training of large language models, identifying geometry conflict as both a cause of forgetting and a control mechanism for update integration.
Rethinking Agentic Search with Pi-Serini: Is Lexical Retrieval Sufficient? huggingface.co
Lexical retrievers remain effective for deep research tasks when paired with advanced LLMs, outperforming dense retrievers in answer accuracy and evidence recall.
Reinforcing Multimodal Reasoning Against Visual Degradation huggingface.co
ROMA is an RL fine-tuning framework that enhances multimodal large language models’ robustness against visual degradations while maintaining performance on clean inputs through a dual-forward-pass strategy and specialized regularization techniques.
Crosslingual On-Policy Self-Distillation for Multilingual Reasoning huggingface.co
COPSD transfers high-resource language model reasoning behavior to low-resource languages using self-distillation with crosslingual context, improving mathematical reasoning performance.
AgentForesight: Online Auditing for Early Failure Prediction in Multi-Agent Systems huggingface.co
AgentForesight is a framework that enables real-time error detection in multi-agent systems by identifying decisive failures during trajectory execution rather than after completion.
Conformal Agent Error Attribution huggingface.co
A conformal prediction framework is proposed for error attribution in multi-agent systems, enabling precise error isolation and automated recovery through contiguous sequence predictions.
TMAS: Scaling Test-Time Compute via Multi-Agent Synergy huggingface.co
TMAS is a multi-agent framework for test-time scaling that enhances large language model reasoning through structured collaboration and hierarchical memory systems.
SEIF: Self-Evolving Reinforcement Learning for Instruction Following huggingface.co
A self-evolving reinforcement learning framework enhances large language model instruction-following capabilities through iterative difficulty adaptation and co-training of instructor and follower components.
Rebellious Student: Reversing Teacher Signals for Reasoning Exploration with Self-Distilled RLVR huggingface.co
RLRT enhances self-distillation by reinforcing successful student decisions that deviate from teacher predictions, enabling more effective exploration in reinforcement learning via self-reward.
jina-embeddings-v5-omni: Text-Geometry-Preserving Multimodal Embeddings via Frozen-Tower Composition huggingface.co
Frozen-encoder model composition enables efficient multimodal embedding by freezing pre-trained text models and adding specialized encoders for images and audio while maintaining text embedding consistency.
DeepRefine: Agent-Compiled Knowledge Refinement via Reinforcement Learning huggingface.co
DeepRefine is an LLM-based reasoning model that refines agent-compiled knowledge bases through multi-turn interactions and targeted updates to improve downstream task performance.
Prompt-Activation Duality: Improving Activation Steering via Attention-Level Interventions huggingface.co
Activation steering in language models suffers from KV-cache contamination in dialogue settings, which GCAD addresses by extracting steering signals from prompt contributions and applying token-level gating to improve long-horizon coherence.
Mela: Test-Time Memory Consolidation based on Transformation Hypothesis huggingface.co
A memory-augmented transformer architecture called Mela incorporates hierarchical memory modules inspired by human memory consolidation processes, enabling improved long-context language modeling through multi-granularity memory representations.
GridProbe: Posterior-Probing for Adaptive Test-Time Compute in Long-Video VLMs huggingface.co
GridProbe enables efficient long-video understanding by adaptively selecting relevant frames using a frozen VLM’s reasoning, achieving sub-quadratic attention cost with minimal accuracy loss through shape-adaptive selection and interpretable importance maps.
CollabVR: Collaborative Video Reasoning with Vision-Language and Video Generation Models huggingface.co
A closed-loop framework collaboratively integrates vision-language models with video generation models at step level to improve visual reasoning by enabling real-time failure detection and correction during video generation.
Pixal3D: Pixel-Aligned 3D Generation from Images huggingface.co
Pixal3D introduces a pixel-aligned 3D generation approach that addresses fidelity issues in 3D asset creation by establishing direct pixel-to-3D correspondences through back-projection conditioning.
LLaVA-UHD v4: What Makes Efficient Visual Encoding in MLLMs? huggingface.co
Efficient visual encoding for high-resolution inputs in multimodal large language models is achieved through slice-based encoding and intra-ViT early compression, reducing computational costs while maintaining performance.
RigidFormer: Learning Rigid Dynamics using Transformers huggingface.co
RigidFormer is a Transformer-based model that learns mesh-free rigid-body dynamics using object-centric processing and anchor-based attention mechanisms to achieve high-fidelity simulation with efficient computation.
FlashEvolve: Accelerating Agent Self-Evolution with Asynchronous Stage Orchestration huggingface.co
FlashEvolve enhances LLM-based evolution frameworks by implementing asynchronous execution and artifact version tracking to reduce computational bottlenecks while maintaining evolutionary quality.
Metal-Sci: A Scientific Compute Benchmark for Evolutionary LLM Kernel Search on Apple Silicon huggingface.co
A benchmark for optimizing scientific computing kernels on Apple Silicon is paired with an automated search framework that uses a large language model in a evolutionary loop to optimize performance while identifying hidden regressions through out-of-distribution evaluation.
FORTIS: Benchmarking Over-Privilege in Agent Skills huggingface.co
Large language model agents frequently exceed necessary privileges when selecting and executing skills, with performance declining under realistic user interaction conditions.
Dynamic Skill Lifecycle Management for Agentic Reinforcement Learning huggingface.co
SLIM framework manages dynamic skill lifecycles in agentic reinforcement learning by optimizing active skill sets jointly with policy learning, outperforming existing methods in complex task environments.
NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation huggingface.co
NanoResearch is a multi-agent framework that enhances research automation through personalized assistance by accumulating reusable skills, maintaining user-specific experience, and internalizing implicit preferences through co-evolving components.
Shaping Schema via Language Representation as the Next Frontier for LLM Intelligence Expanding huggingface.co
Language representation design significantly impacts large language model performance and internal feature activations, offering a promising research direction for enhancing model intelligence without scaling or parameter modifications.
Queryable LoRA: Instruction-Regularized Routing Over Shared Low-Rank Update Atoms huggingface.co
A data-adaptive method for efficient fine-tuning of large neural networks uses a shared memory of low-rank update atoms with attention-based routing to enable dynamic, context-sensitive parameter updates while maintaining scalability.
PaperFit: Vision-in-the-Loop Typesetting Optimization for Scientific Documents huggingface.co
Visual typesetting optimization addresses the gap between compilable LaTeX documents and publication-ready PDFs through vision-in-the-loop agents that iteratively diagnose and repair layout defects.
Auto-Rubric as Reward: From Implicit Preferences to Explicit Multimodal Generative Criteria huggingface.co
Auto-Rubric as Reward (ARR) framework externalizes implicit preference knowledge into structured rubrics for improved multimodal alignment, while Rubric Policy Optimization (RPO) stabilizes policy gradients through binary rewards derived from multi-dimensional evaluation.
X-OmniClaw Technical Report: A Unified Mobile Agent for Multimodal Understanding and Interaction huggingface.co
X-OmniClaw is a unified mobile agent architecture that integrates multimodal perception, memory, and action components to enable intelligent interaction within Android environments.
Omni-Persona: Systematic Benchmarking and Improving Omnimodal Personalization huggingface.co
Omni-Persona introduces the first comprehensive benchmark for omnimodal personalization, featuring a Persona Modality Graph and Calibrated Accuracy metric to diagnose grounding behaviors across text, image, and audio modalities.
DeltaRubric: Generative Multimodal Reward Modeling via Joint Planning and Verification huggingface.co
DeltaRubric introduces a two-step multimodal preference evaluation approach using a single MLLM, where a Disagreement Planner generates instance-specific verification checklists and a Checklist Verifier executes these checks to produce grounded judgments, improving reward modeling reliability.
MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation huggingface.co
MuSS is a large-scale dual-track dataset designed for multi-shot video generation that addresses narrative logic, spatiotemporal alignment, and copy-paste issues in subject-to-video generation through a progressive captioning pipeline and cross-shot matching mechanism.
RoboMemArena: A Comprehensive and Challenging Robotic Memory Benchmark huggingface.co
RoboMemArena presents a large-scale robotic memory benchmark with diverse tasks and real-world evaluation, while PrediMem demonstrates improved memory management through a dual-system vision-language architecture with predictive coding.
PlantMarkerBench: A Multi-Species Benchmark for Evidence-Grounded Plant Marker Reasoning huggingface.co
PlantMarkerBench presents a multi-species benchmark for evaluating literature-based plant marker evidence interpretation, assessing models on identifying valid marker evidence and categorizing evidence types across four plant species.
Training-Free Dense Hand Contact Estimation with Multi-Modal Large Language Models huggingface.co
ContactPrompt enables zero-shot dense hand contact estimation using multi-modal large language models through structured 3D hand geometry encoding and multi-stage contact reasoning.
References
Maxime Labonne, ‘Uncensor any LLM with abliteration’ (Medium) medium.com
By calculating the mean difference in activations between harmful and harmless prompts, researchers identified a vector that, when ablated through weight orthogonalization, prevents the model from refusing harmful requests while preserving general capabilities.
Wang et al., ‘Concept cones’ multi-direction refusal analysis (arXiv 2505.17306) arxiv.org
Recent studies have identified multi-dimensional ‘concept cones’ with dimensions as high as five across various models… multi-directional suppression consistently achieves higher Attack Success Rates than single-direction baselines on frontier models like Qwen and Llama.
Prakash et al., ‘Beyond I’m Sorry, I Can’t’ (arXiv 2512.13655) arxiv.org
Models maintain back-up safety features… these redundant features often remain dormant until primary refusal features are suppressed, suggesting that LLM safety is not just a single ‘off-switch’ but a layered defensive circuit.
Pruning neurons highly responsive to unsafe prompts effectively ‘jailbreaks’ the model, leading to over 90% attack success rates with minimal impact on general utility.
Qi et al. / ‘Shallow Safety Alignment’ summary (thedatapraxis.com) thedatapraxis.com
Current alignment techniques predominantly adjust the probability distribution of only the first few output tokens to produce standard refusal phrases like ‘I cannot assist with that’… once a model is forced past this initial ‘refusal prefix,’ the underlying capability to generate harmful content remains largely intact in deeper layers.
Chen et al., ‘RepIt: Representing Isolated Targets to Steer Language Models’ (ResearchGate) researchgate.net
Techniques like ‘RepIt’ can selectively suppress safety filters on specific harmful concepts while keeping them active elsewhere. This allows for the creation of ‘model organisms’ that answer questions about weapons of mass destruction or self-harm while appearing safe on standard benchmarks.
Medium (Adithya Giridharan) medium.com
Continuous diffusion language models were held back by a habit, not a limitation… ELF maintains a continuous trajectory until the very last step, using approximately 80% of training on a Mean Squared Error (MSE) objective and 20% on Cross-Entropy (CE) loss to refine the final discretization.
Caradryanl blog – LangFlow analysis caradryanl.github.io
LangFlow connects embedding-space diffusion to Flow Matching through Bregman divergence… predicting clean token probabilities from noisy embeddings at every step is essential for reaching the generative quality of discrete models — precisely what ELF identifies as a ‘handbrake’ on performance.
ArxivIQ Substack arxiviq.substack.com
ELF claims a superior quality-efficiency trade-off, reportedly achieving lower generative perplexity with 10x fewer training tokens and significantly fewer sampling steps (32 vs 1024) than concurrent LangFlow.
JetBrains AI blog blog.jetbrains.com
DLMs are now 5–10x faster than autoregressive models due to their parallel decoding architecture… however each diffusion step involves a full bidirectional attention pass, which is more compute-intensive than a single causal AR step, and standard KV caching is fundamentally incompatible with bidirectional denoising.
GitHub: libo-huang/kaiming-he-arxiv-papers github.com
ELF marks Kaiming He’s first major foray into NLP; the work is co-first-authored by Keya Hu and Linlu Qiu, with co-authors Yoon Kim and Jacob Andreas (MIT CSAIL).
36Kr (Cola DLM coverage) eu.36kr.com
Perplexity is becoming obsolete for diffusion paradigms — PPL over-penalizes semantic synonyms; if a model generates a perfect synonym that does not match the exact ground-truth token, the PPL skyrockets despite the generation’s high quality.
r/StableDiffusion thread ‘Qwen Image 2 is amazing, any idea when 7B is coming?’ reddit.com
Community tracking noted a shift in the official Qwen website’s tagging from ‘Open-Source’ to simply ‘Release’ shortly after launch, causing pessimism among developers regarding a local deployment release.
The Decoder the-decoder.com
Alibaba’s Qwen-Image-2.0 doubles compression and cuts generation steps from 40 to 4… final tuning is governed by five distinct reward models designed to balance creative fidelity with safety constraints.
BudgetPixel review budgetpixel.com
Testers have reported failures in Polish, including missing diacritics and the literal rendering of newline characters (e.g., writing ‘/n’ instead of creating a break)… text often appears ‘photoshopped’ into scenes rather than naturally integrated.
WaveSpeed AI engineering blog (‘Fix Qwen-Image text artifacts’) wavespeed.ai
Periodic high-frequency grid artifacts and a ‘screen door effect’ visible upon close inspection… often attributed to the model’s VAE scale factor and most pronounced in the center of generated images.
WaveSpeed AI (Qwen-Image-Edit-2509 deep-dive) wavespeed.ai
2509 was specifically trained on image concatenation for ‘person + product’ and ‘person + scene’ combinations… often cited in community forums as the ‘absolute beast’ for maintaining strict subject identity during complex pose changes.
YouTube workflow walkthrough (ComfyUI Qwen-Image-2.0) youtube.com
GGUF quantized versions (Q4 or Q8) and Lightning LoRAs can reduce inference from the standard 20-30 steps down to just 4-8 steps without substantial quality loss; ImageScaleToTotalPixels node manages VRAM by scaling inputs to ~1M pixels before VAE encoding.