JS Wei (Jack) Sun

Claude Fable spins up a debug harness, torch.compile fuses 5 kernels into 1

Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.

← Back to the issue

Sources

Claude Fable is relentlessly proactive simonwillison.net

After two days of experience with Claude Fable 5 I think the best way to describe it is relentlessly proactive . It knows a whole lot of tricks and it will deploy pretty much any of them to get to its goal. I’ll illustrate this with an example. I was hacking on Datasette Agent today when I noticed a glitch: a horizontal scrollbar that shouldn’t be there in the jump menu chat prompt. I snapped this screenshot: Then I started a fresh claude session in my datasette-agent checkout, dragged in the s…

datasette 1.0a33 simonwillison.net

Release: datasette 1.0a33 This alpha is a significant step on the road to a stable 1.0, finally extending the ?_extra= pattern I introduced in Datasette 1.0a3 to cover queries and rows in addition to tables. That pattern is also now documented ! I wrote a whole lot more about the new release on the Datasette project blog: Datasette 1.0a33 with JSON extras in the API . Because API explorer tools are almost free to build now I had Claude Fable 5 in Claude Code (for the plan ) and GPT-5.5 xhigh in…

asyncinject 0.7 simonwillison.net

Release: asyncinject 0.7 I built this utility library to support an asyncio dependency injection pattern a few years ago. I was using it with Datasette and Claude Fable 5 spotted some bugs in the dependency which it then fixed for me. It’s a very proactive model! Tags: async , projects , python , claude-mythos

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP huggingface.co

References

The Decoder the-decoder.com

Claude Fable 5, the first Mythos model, is powerful, expensive, and heavily filtered.

VentureBeat venturebeat.com

Surprise upset: GPT-5.5 beats Claude Fable 5 on brutal new Agents’ Last Exam benchmark.

TrueFoundry benchmark comparison truefoundry.com

Fable 5 leads SWE-Bench Pro at 80.3% but costs roughly 2x Opus 4.8 at $10/$50 per million tokens, making it cost-prohibitive for routine repetitive tasks.

TrueFoundry — Claude Code prompt injection writeup truefoundry.com

Microsoft Threat Intelligence disclosed a vulnerability where the Claude Code GitHub Action could be manipulated into reading /proc/self/environ, exfiltrating ANTHROPIC_API_KEY and other CI/CD secrets.

Hacker News discussion news.ycombinator.com

High INT, zero WIS — refusals come from hard-coded guardrails rather than intent understanding; running it without a sandbox is ‘feet on the dashboard, trusting the airbags.’

datasette.io blog — API extras datasette.io

1.0a33 extends the ?_extra= parameter mechanism to row and query pages, enabling minimalist default responses — a long-standing 1.0 blocker now being closed out.

Liger-Kernel paper (arXiv 2410.10989) arxiv.org

Liger-Kernel can achieve an average 20% increase in training throughput and a 60% reduction in GPU memory usage compared to standard Hugging Face implementations… LLaMA 3-8B training can scale from a 4K to a 16K context window on the same hardware.

PyTorch GitHub issue #139908 (addmm vs mm+add) github.com

For extremely large or specifically shaped matrices (e.g., [32768, 12288]), a separate mm + add approach can be ~30% faster than a fused addmm call, with MFU dropping significantly in the fused case (0.61 vs 0.91).

Ashish Malik blog — torch.compile benchmarks ashishmalik.in

Simple linear models can actually experience a performance degradation of approximately 8.28% due to compilation overhead, while complex models like ConvNets or mixed architectures see speedups exceeding 5x.

OpenReview discussion of Liger-Kernel openreview.net

Memory savings are drastic for 7B models (~36%), but drop significantly for larger models like 14B (~6%), where other bottlenecks (e.g., optimizer states) dominate peak memory… dtype mismatches and conflicts when combining Liger with torch.compile or specific PEFT/BitsAndBytes configurations.

Medium — ‘LLMs can now write GPU kernels that beat torch.compile’ medium.com

Compilers cannot ‘invent’ algorithmic breakthroughs like the specialized memory-saving patterns found in Liger’s custom fusions… Inductor regresses against Liger on specific operators like RoPE with speedup ratios as low as 0.41–0.58.

PyTorch 2026 talk — CuteDSL Inductor backend youtube.com

CuteDSL is integrated as a first-class autotuning backend for TorchInductor… epilogue visitor trees for CUTLASS and AMD’s Composable Kernel enable flexible multi-op C++ epilogues that maintain parity with hand-written Triton kernels.

Jack Sun

Jack Sun, writing.

Engineer · Bay Area

Hands-on with agentic AI all day — building frameworks, reading what industry ships, occasionally writing them down.

Digest
All · AI Tech · AI Research · AI News
Writing
Essays
Elsewhere
Subscribe
All · AI Tech · AI Research · AI News · Essays

© 2026 Wei (Jack) Sun · jacksunwei.me Built on Astro · hosted on Cloudflare