Sources

Project Glasswing: An initial update anthropic.com

Towards Speed-of-Light Text Generation with Nemotron-Labs Diffusion Language Models huggingface.co

Specialization Beats Scale: A Strategic Variable Most AI Procurement Decisions Overlook huggingface.co

(AINews) OpenAI GPT-next disproves 80 year old Erdős planar unit distance problem for under $1000 latent.space

GPT-next disproved Erdős’s planar unit-distance conjecture, a problem open since the 1940s, using less than $1,000 in compute. The result lands as a quiet but pointed data point for AI-assisted mathematics, showing frontier models chipping at long-standing combinatorial questions rather than just competition-style proofs.

References

Daniel Stenberg (curl maintainer) blog daniel.haxx.se

I think using the term confirmed is a little amusing when the AI says it confidently by itself. Yes, the AI thinks they are confirmed, but the curl security team has a slightly different take… My personal conclusion can however not end up with anything else than that the big hype around this model so far was primarily marketing.

Mashable (Heidy Khlaaf critique) mashable.com

Khlaaf argues that the name ‘Mythos’ and its associated marketing create a ‘mythology’ of unprecedented danger that serves to justify withholding the model from independent evaluation… forming a coalition with tech giants… may be establishing a de facto standard for ‘safe’ AI deployment that only the most well-funded entities can meet.

Forbes — Microsoft MDASH benchmark forbes.com

Microsoft’s MDASH system recently outperformed Mythos on the same [CyberGym] benchmark, scoring 88.4% by utilizing a multi-agent ‘debate’ architecture that filters false positives.

UK AI Security Institute evaluation aisi.gov.uk

Mythos Preview became the first AI system to fully solve ‘The Last Ones,’ a 32-step simulated corporate network attack that typically requires 20 hours of human expert labor… while the model excels in ‘weakly defended’ environments, its efficacy against active human defenders remains unproven.

Tom’s Hardware — Linus Torvalds on AI reports tomshardware.com

Linus Torvalds recently warned that the Linux kernel’s security mailing list has become ‘almost entirely unmanageable’ due to redundant, automated findings that offer no path to a fix.

The Decoder — ExploitBench cost analysis the-decoder.com

ExploitBench co-author Seunghyun Lee characterized Mythos as a ‘fairly competent’ researcher but noted its high operational cost; one full benchmark run cost approximately $36,000 compared to $3,000 for GPT-5.5.

NVIDIA Developer Blog — ‘How Small Language Models Are Key to Scalable Agentic AI’ developer.nvidia.com

SLMs (defined as models <10B parameters capable of running on consumer-grade hardware) act as the primary workhorses… 10–30x cheaper to run per token in real-world agent systems

InfoWorld — ‘Small language models: Rethinking enterprise AI architecture’ infoworld.com

critics point to ‘complete accuracy collapse’ when SLMs encounter tasks requiring multi-step reasoning or novel queries outside their training distribution… Amazon’s Rufus shopping assistant achieving only 32% recommendation accuracy and Air Canada’s chatbot inventing a non-existent refund policy

Unsiloed.ai — ‘Unsiloed AI Achieves #1 Rank on olmOCR-Bench-2’ unsiloed.ai

Unsiloed Parser recently claimed the #1 spot with a deterministic pass-rate of 88.0, narrowly edging out Nanonets OCR-3 (87.4) and significantly outperforming frontier models like GPT-5.5 (84.6) and Claude 4.7

Hacker News discussion on DharmaOCR / LLM-OCR news.ycombinator.com

while traditional OCR (like Tesseract) fails predictably by outputting ‘gibberish’ when unconfident, LLM-based OCR can ‘confidently hallucinate,’ creating risks for sensitive documents like financial statements

sumpdibesus.blog — Dharma AI company profile sumpdibesus.blog

Founded as a strategic spin-off from EloGroup—one of Brazil’s leading management consulting firms… secured one of Brazil’s largest seed funding rounds, led by the Lorinvest fund

OpenReview methodology discussion of DharmaOCR-Benchmark openreview.net

the use of BLEU in an OCR context is a point of methodological debate… Open questions remain regarding the ‘reproducibility’ of the human-in-the-loop labeling strategy used to create the ground truth

Sean Goedecke — ‘Limitations of text diffusion models’ seangoedecke.com

A diffusion model has less space for the model to spend ‘thinking’ per token… it edits the entire output block during every pass, so the attention scores for every token must be recalculated against the entire context window every single time.

LMSYS blog — Diffusion LLM in SGLang lmsys.org

LLaDA 2.0-flash can reach throughputs of up to 935 tokens per second—nearly 3.5x faster than standard AR models like gpt-oss-120B on comparable tasks; the block-wise attention is bidirectional within a single block yet remains strictly causal across blocks, allowing reuse of RadixAttention and paged KV-caching.

HuggingFace blog (ProCreations) — DLM landscape huggingface.co

Google’s Gemini Diffusion leads the experimental pack at a reported 1,479 tokens per second… Mercury Coder demonstrates stable performance between 700 and 1,100 tokens per second on NVIDIA H100 GPUs.

Medium / ML-today — converting AR to diffusion LMs medium.com

DiffuGPT and DiffuLLaMA were converted with relatively low data budgets—often fewer than 200 billion tokens—while maintaining the fluency and in-context learning capabilities of their predecessors.

Liner review — Efficient-DLM liner.com

Efficient-DLM 8B achieved a 4.5x higher throughput compared to contemporary models like Dream 7B, while maintaining higher accuracy by optimizing the attention mechanism for KV-cache compatibility.

HuggingFace — Nemotron-Labs-Diffusion-VLM-8B model card huggingface.co

The VLM variant is released under the more restrictive NVIDIA Source Code License (NSCLv1)… incurs only a 0.1% accuracy drop when running in linear self-speculation mode compared to its standard AR mode.

Sources

References

Jack Sun, writing.