Claude Fable spins up a debug harness, torch.compile fuses 5 kernels into 1
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
Claude Fable is relentlessly proactive simonwillison.net
After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive . It knows a whole lot of tricks and it will deploy pretty much any of them to get to its goal. I’ll illustrate this with an example. I was hacking on Datasette Agent today when I noticed a glitch: a horizontal scrollbar that shouldn’t be there in the jump menu chat prompt. I snapped this screenshot: Then I started a fresh claude session in my datasette-agent checkout, dragged in the s…
datasette 1.0a33 simonwillison.net
Release: datasette 1.0a33 This alpha is a significant step on the road to a stable 1.0, finally extending the ?_extra= pattern I introduced in Datasette 1.0a3 to cover queries and rows in addition to tables. That pattern is also now documented ! I wrote a whole lot more about the new release on the Datasette project blog: Datasette 1.0a33 with JSON extras in the API . Because API explorer tools are almost free to build now I had Claude Fable 5 in Claude Code (for the plan ) and GPT-5.5 xhigh in…
asyncinject 0.7 simonwillison.net
Release: asyncinject 0.7 I built this utility library to support an asyncio dependency injection pattern a few years ago. I was using it with Datasette and Claude Fable 5 spotted some bugs in the dependency which it then fixed for me. It’s a very proactive model! Tags: async , projects , python , claude-mythos
Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP huggingface.co
References
The Decoder the-decoder.com
Claude Fable 5, the first Mythos model, is powerful, expensive, and heavily filtered.
VentureBeat venturebeat.com
Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark.
TrueFoundry benchmark comparison truefoundry.com
Fable 5 leads SWE-Bench Pro at 80.3% but costs roughly 2x Opus 4.8 at $10/$50 per million tokens, making it cost-prohibitive for routine repetitive tasks.
TrueFoundry — Claude Code prompt injection writeup truefoundry.com
Microsoft Threat Intelligence disclosed a vulnerability where the Claude Code GitHub Action could be manipulated into reading /proc/self/environ, exfiltrating ANTHROPIC_API_KEY and other CI/CD secrets.
Hacker News discussion news.ycombinator.com
High INT, zero WIS — refusals come from hard-coded guardrails rather than intent understanding; running it without a sandbox is ‘feet on the dashboard, trusting the airbags.’
datasette.io blog — API extras datasette.io
1.0a33 extends the ?_extra= parameter mechanism to row and query pages, enabling minimalist default responses — a long-standing 1.0 blocker now being closed out.
Liger-Kernel paper (arXiv 2410.10989) arxiv.org
Liger-Kernel can achieve an average 20% increase in training throughput and a 60% reduction in GPU memory usage compared to standard Hugging Face implementations… LLaMA 3-8B training can scale from a 4K to a 16K context window on the same hardware.
PyTorch GitHub issue #139908 (addmm vs mm+add) github.com
For extremely large or specifically shaped matrices (e.g., [32768, 12288]), a separate mm + add approach can be ~30% faster than a fused addmm call, with MFU dropping significantly in the fused case (0.61 vs 0.91).
Ashish Malik blog — torch.compile benchmarks ashishmalik.in
Simple linear models can actually experience a performance degradation of approximately 8.28% due to compilation overhead, while complex models like ConvNets or mixed architectures see speedups exceeding 5x.
OpenReview discussion of Liger-Kernel openreview.net
Memory savings are drastic for 7B models (~36%), but drop significantly for larger models like 14B (~6%), where other bottlenecks (e.g., optimizer states) dominate peak memory… dtype mismatches and conflicts when combining Liger with torch.compile or specific PEFT/BitsAndBytes configurations.
Medium — ‘LLMs can now write GPU kernels that beat torch.compile’ medium.com
Compilers cannot ‘invent’ algorithmic breakthroughs like the specialized memory-saving patterns found in Liger’s custom fusions… Inductor regresses against Liger on specific operators like RoPE with speedup ratios as low as 0.41–0.58.
PyTorch 2026 talk — CuteDSL Inductor backend youtube.com
CuteDSL is integrated as a first-class autotuning backend for TorchInductor… epilogue visitor trees for CUTLASS and AMD’s Composable Kernel enable flexible multi-op C++ epilogues that maintain parity with hand-written Triton kernels.