Sources

datasette-agent-edit 0.1a0 simonwillison.net

Release: datasette-agent-edit 0.1a0 I’m planning several plugins for Datasette Agent which can make edits to existing pieces of text - things like collaborative Markdown editing, updating large SQL queries, and editing SVG files. Agentic editing of text is a little tricky to get right. My favorite published design for this is for the Claude text editor , which implements the following tools: view - view sections of a file, with line numbers added to every line. str_replace - find an exact old_s…

Amazing Digital Dentures (a failed project) huggingface.co

References

dev.to — Alex Chen, ‘The Harness Problem is Real’ dev.to

str_replace requires perfect reproduction of indentation… a single tab-vs-space difference causes a ‘String to replace not found’ error, especially when auto-formatters modify the file between Read and Edit calls.

aider.chat — Unified Diffs documentation aider.chat

Switching from search/replace blocks to a unified diff format reduced GPT-4 Turbo’s ‘lazy coding’ by 3x in Aider’s benchmark.

dev.to — ‘I Benchmarked 5 File Editing Strategies for AI Coding Agents’ dev.to

For 10 changes to a 1,000-line file, script- and diff-based patching were 3.5x cheaper and 6.5x faster than sequential str_replace-style line edits; whole-file rewrites suffered from ‘lost-in-the-middle’ omissions.

Ben’s Bites — coverage of Simon Willison’s ‘Hashline’ format bensbites.com

Grok Code Fast jumped from 6.7% to 68.3% success on coding tasks simply by switching to the Hashline tool interface — no model retraining — while Gemini gained a stable 8%.

OpenClaw Playbook — ‘Diff Artifacts for Agent Review’ openclawplaybook.ai

A chat summary of an edit is insufficient for security; agents should emit a read-only diff artifact a human can inspect before commit, using familiar version-control tools to catch unintended side effects.

Morph LLM — ‘Diff Format Explained’ morphllm.com

Unified diffs let the agent send only the necessary changes plus context lines, bypassing the need to process or regenerate surrounding code — the dominant cost driver for str_replace-style tools on large files.

Medium technical review of Nemotron-3 Nano 30B A3B medium.com

while the model could build a physics simulator from scratch, it required constant ‘babysitting’ as it occasionally lost focus or executed bizarre, non-functional commands

MindStudio model card for Nemotron-3 Nano 30B mindstudio.ai

68.3% score on LiveCodeBench and a 38.8% achievement on the rigorous SWE-bench… activates roughly 3.2 to 3.6 billion per token

Anastasios Vanis, ‘The Sweet Spot of Agentic Coding’ (Medium) medium.com

single-shot generation remains an ‘illusion’ for production-ready apps; true reliability requires a persistent ‘harness’ that includes automated testing and iterative human oversight

GenAI Secret Sauce Daily Digest, 2026-06-07 genaisecretsauce.com

‘Her’ utilized the Nemotron-Mini-4B-Instruct model on ZeroGPU exclusively for handling prose and ‘softer suggestions,’ while leaving the heavy lifting of code forensics to a deterministic engine

r/LocalLLaMA thread on Qwen 3.5 4B reddit.com

Qwen 3.5 4B is cited as being able to create ‘Ray caster games in one shot’

PrintingPressAI write-up of ‘Amazing Digital Dentures’ printingpressai.com

long system prompts explaining game mechanics and Three.js syntax often ‘blew past’ the effective context window of the Nemotron 30B model, leading to degraded performance and hallucinated APIs

Sources

References

Jack Sun, writing.