Willison ships str_replace over Hashline, Nemotron skips the verify harness
Every URL the pipeline pulled into ranking for this issue — primary sources plus the supporting and contradicting findings each Researcher returned. Inline citations in the issue point back here.
Sources
datasette-agent-edit 0.1a0 simonwillison.net
Release: datasette-agent-edit 0.1a0 I’m planning several plugins for Datasette Agent which can make edits to existing pieces of text - things like collaborative Markdown editing, updating large SQL queries, and editing SVG files. Agentic editing of text is a little tricky to get right. My favorite published design for this is for the Claude text editor , which implements the following tools: view - view sections of a file, with line numbers added to every line. str_replace - find an exact old_s…
Amazing Digital Dentures (a failed project) huggingface.co
References
dev.to — Alex Chen, ‘The Harness Problem is Real’ dev.to
str_replace requires perfect reproduction of indentation… a single tab-vs-space difference causes a ‘String to replace not found’ error, especially when auto-formatters modify the file between Read and Edit calls.
aider.chat — Unified Diffs documentation aider.chat
Switching from search/replace blocks to a unified diff format reduced GPT-4 Turbo’s ‘lazy coding’ by 3x in Aider’s benchmark.
dev.to — ‘I Benchmarked 5 File Editing Strategies for AI Coding Agents’ dev.to
For 10 changes to a 1,000-line file, script- and diff-based patching were 3.5x cheaper and 6.5x faster than sequential str_replace-style line edits; whole-file rewrites suffered from ‘lost-in-the-middle’ omissions.
Ben’s Bites — coverage of Simon Willison’s ‘Hashline’ format bensbites.com
Grok Code Fast jumped from 6.7% to 68.3% success on coding tasks simply by switching to the Hashline tool interface — no model retraining — while Gemini gained a stable 8%.
OpenClaw Playbook — ‘Diff Artifacts for Agent Review’ openclawplaybook.ai
A chat summary of an edit is insufficient for security; agents should emit a read-only diff artifact a human can inspect before commit, using familiar version-control tools to catch unintended side effects.
Morph LLM — ‘Diff Format Explained’ morphllm.com
Unified diffs let the agent send only the necessary changes plus context lines, bypassing the need to process or regenerate surrounding code — the dominant cost driver for str_replace-style tools on large files.
Medium technical review of Nemotron-3 Nano 30B A3B medium.com
while the model could build a physics simulator from scratch, it required constant ‘babysitting’ as it occasionally lost focus or executed bizarre, non-functional commands
MindStudio model card for Nemotron-3 Nano 30B mindstudio.ai
68.3% score on LiveCodeBench and a 38.8% achievement on the rigorous SWE-bench… activates roughly 3.2 to 3.6 billion per token
Anastasios Vanis, ‘The Sweet Spot of Agentic Coding’ (Medium) medium.com
single-shot generation remains an ‘illusion’ for production-ready apps; true reliability requires a persistent ‘harness’ that includes automated testing and iterative human oversight
GenAI Secret Sauce Daily Digest, 2026-06-07 genaisecretsauce.com
‘Her’ utilized the Nemotron-Mini-4B-Instruct model on ZeroGPU exclusively for handling prose and ‘softer suggestions,’ while leaving the heavy lifting of code forensics to a deterministic engine
r/LocalLLaMA thread on Qwen 3.5 4B reddit.com
Qwen 3.5 4B is cited as being able to create ‘Ray caster games in one shot’
PrintingPressAI write-up of ‘Amazing Digital Dentures’ printingpressai.com
long system prompts explaining game mechanics and Three.js syntax often ‘blew past’ the effective context window of the Nemotron 30B model, leading to degraded performance and hallucinated APIs