Edit tool design: safe file edits for coding agents

Your agent edited the wrong line and there is no undo

You asked the agent to fix the bug on line 42 of auth.py. It ran edit_file with old="validate(token)" and new="validate_v2(token)". Unfortunately, validate(token) appears in 4 files, 7 times total. The edit replaced them all. Your tests now fail in 3 unrelated places. The agent does not know why. Neither do you.

This is what a naive edit tool does. It trusts the string the model sent. It does not verify that the string is unique. It does not preview the change. It does not keep a backup. It does not make the write atomic. And when anything goes wrong, the original file is gone.

This post is the edit tool design I use in coding agents: uniqueness checks on the old string, atomic writes, dry-run previews, and the 5 rules that prevent the model from wrecking your working tree.

Why do naive edit tools corrupt files?

Because they trust the model's string match. The naive tool looks like this:

# filename: naive_edit.py
# description: The edit tool you should not ship. Replaces all matches,
# no uniqueness check, no backup, no atomic write.
def edit_file(path: str, old: str, new: str) -> str:
    with open(path) as fh:
        content = fh.read()
    with open(path, 'w') as fh:
        fh.write(content.replace(old, new))
    return f'edited {path}'

4 things wrong, each a real bug I have shipped:

.replace(old, new) with no count limit replaces every occurrence. The agent wanted to change one call site and changed them all.
No check that old appears in the file at all. A typo in the model's match string silently writes an unchanged file and reports success.
No check that old is unique. If it appears twice, the model does not know which one got edited.
Writing over the original file with no backup. If the process dies mid-write, you lose the whole file.

Fix all 4 and the edit tool becomes trustworthy. Skip any one and the agent will eventually corrupt something important.

graph TD
    Call[edit_file called] --> Read[Read file]
    Read --> Match[Find old string]
    Match -->|not found| Err1[Return error: not found]
    Match -->|multiple matches| Err2[Return error: ambiguous]
    Match -->|exactly one| Preview[Build preview]
    Preview --> Atomic[Write to temp file]
    Atomic --> Rename[Atomic rename]
    Rename --> Ok[Return diff]

    style Err1 fill:#fee2e2,stroke:#b91c1c
    style Err2 fill:#fee2e2,stroke:#b91c1c
    style Ok fill:#dcfce7,stroke:#15803d

5 steps, 2 early-exit errors, 1 atomic write. That is the whole safe pattern.

What is the string-match uniqueness rule?

The model must supply an old string that appears exactly once in the file. Not zero times (nothing to replace, probably a hallucinated match). Not more than once (ambiguous, the tool cannot know which occurrence the model meant). Exactly once.

When the model hands over old="return validate(token)", the tool counts how many times that string appears in the file. If it is zero, return an error asking the model to re-read the file and try again. If it is more than one, return an error asking the model to provide more context (more surrounding lines) to make the match unique.

This rule is the single biggest safety win in edit tool design. It turns "trust the model's match" into "verify the match is unambiguous, reject otherwise." The error messages train the model to provide unique snippets in the next attempt.

# filename: safe_edit.py
# description: A safe edit tool with uniqueness check, atomic write,
# and a diff preview. 50 lines total.
import os
import tempfile
from pathlib import Path


def edit_file(path: str, old: str, new: str) -> dict:
    p = Path(path)
    if not p.exists():
        return {'ok': False, 'error': f'file not found: {path}'}

    content = p.read_text()

    if old not in content:
        return {'ok': False, 'error': f'old string not found in {path}'}

    occurrences = content.count(old)
    if occurrences > 1:
        return {
            'ok': False,
            'error': f'old string appears {occurrences} times in {path}; '
                     'provide more surrounding context so the match is unique',
        }

    new_content = content.replace(old, new, 1)

    tmp_fd, tmp_path = tempfile.mkstemp(dir=p.parent, prefix=f'.{p.name}.', suffix='.tmp')
    try:
        with os.fdopen(tmp_fd, 'w') as fh:
            fh.write(new_content)
        os.replace(tmp_path, p)
    except Exception:
        if os.path.exists(tmp_path):
            os.remove(tmp_path)
        raise

    return {
        'ok': True,
        'path': str(p),
        'bytes_before': len(content),
        'bytes_after': len(new_content),
    }

Read the write section. tempfile.mkstemp creates a temp file in the same directory (so the rename is atomic on the same filesystem). os.replace is the atomic rename. If the write fails partway through, the temp file is orphaned (and cleaned up) but the original file is untouched.

Why does atomic write matter for an agent?

Because agents crash. The LLM provider returns an error mid-generation. The process is killed by a deploy. The disk fills up. Without atomic writes, any of these leaves a half-written file on disk. The agent comes back up, reads the corrupted file, and decides to fix it, usually in a way that compounds the damage.

Atomic writes use the "write to temp, then rename" pattern. The temp file is written fully (or not at all). The rename is a single filesystem operation that either succeeds (new content in place) or fails (original content intact). There is no intermediate state where the file exists but is half-written.

The critical detail: the temp file must be on the same filesystem as the target. os.replace is atomic within a filesystem but falls back to a copy-and-delete across filesystems, which loses the atomic property. tempfile.mkstemp(dir=p.parent) guarantees the temp file sits next to the target.

How do you give the model a dry-run preview?

Pair edit_file with an edit_preview tool that returns the diff without writing. The model can call preview first to sanity-check its own change, then call edit to commit.

# filename: edit_preview.py
# description: Dry-run preview that returns a unified diff without writing.
# Same uniqueness check as the real edit, same error messages.
import difflib


def edit_preview(path: str, old: str, new: str) -> dict:
    p = Path(path)
    if not p.exists():
        return {'ok': False, 'error': f'file not found: {path}'}

    content = p.read_text()
    if old not in content:
        return {'ok': False, 'error': f'old string not found in {path}'}
    if content.count(old) > 1:
        return {'ok': False, 'error': f'old string not unique in {path}'}

    new_content = content.replace(old, new, 1)
    diff = list(difflib.unified_diff(
        content.splitlines(keepends=True),
        new_content.splitlines(keepends=True),
        fromfile=f'{path} (before)',
        tofile=f'{path} (after)',
        n=3,
    ))
    return {'ok': True, 'diff': ''.join(diff)}

The unified diff is what the model sees. It can inspect the surrounding context, verify the change is what it intended, and only then call the real edit. This is especially useful for big edits where the model might not remember the surrounding code exactly.

Whether preview is worth the extra tool call depends on the workload. For high-stakes edits (production code, config files), yes. For quick fixes and test scaffolding, no. Most agents include both and let the system prompt decide which to prefer.

How should the edit tool handle multi-line strings?

Treat them as opaque text. The old string can be one line, 10 lines, or 100 lines. The uniqueness check still applies: exactly one match in the file. Multi-line snippets are usually more unique than single-line ones, so the model tends to use them for tricky edits.

The one gotcha is line endings. A multi-line old string with \n line endings will not match a file with \r\n line endings. The safe pattern is to normalize line endings before matching:

# filename: normalize.py
# description: Normalize line endings to LF before matching and writing.
def normalize_le(text: str) -> str:
    return text.replace('\r\n', '\n').replace('\r', '\n')

Apply this to both the file content and the old string before counting matches. On write, use whatever line ending convention the project uses (check a sample file first or always write LF and let git handle it).

When should you use line-range edits instead of string matches?

When the file has no stable unique substring around the target. Example: a JSON config with many similar "enabled": true lines. A string match is ambiguous. A line range (edit_file path=config.json start_line=42 end_line=42 new_content="...") is precise.

The trade-off: line-range edits are fragile. If the model re-reads the file after an earlier edit, the line numbers shift. String-match edits are reliable to line-number changes because they anchor on surrounding context, not on position.

My rule: default to string-match edits. Fall back to line-range only when the file structure makes string matches impossible. In practice, string matches cover 95 percent of edits in a real codebase.

For the broader picture of how the edit tool fits into a coding agent's tool set, see the Build a Coding Agent with Claude: A Step-by-Step Guide post. For the modular registry pattern that makes adding edit variants clean, see Designing Modular Tool Integrations for Coding Agents.

What to do Monday morning

Open your current edit tool. If it uses .replace(old, new) without a count limit, that is your biggest bug. Add the uniqueness check before anything else.
Add the atomic write pattern (tempfile plus os.replace). It is 5 lines and it eliminates the half-written-file failure mode forever.
Add clear error messages for "not found" and "ambiguous" cases. The messages train the model to supply unique snippets on the next attempt.
Consider adding edit_preview as a separate tool. Useful for high-stakes edits; the system prompt decides when to prefer it.
Add a test that writes a file with validate(token) appearing twice and confirms your edit tool refuses to edit without more context. Run it in CI forever.

The headline: a safe edit tool is 50 lines of uniqueness checks, atomic writes, and clear errors. Every line of that 50 catches a real bug. Cut any of them and you ship the bug.

Frequently asked questions

How should a coding agent's edit tool prevent file corruption?

By checking that the old string is present and unique before editing, writing to a temp file, and using an atomic rename to replace the original. Together these prevent the 4 common corruption modes: silent no-ops, ambiguous edits, half-written files, and wrong-target replacements. The whole pattern fits in about 50 lines of Python.

Why is a string uniqueness check important in edit tools?

Because str.replace with no count limit will replace every occurrence, and the model may not realize the match is ambiguous. Without a uniqueness check, a single-word fix can rewrite 20 call sites. Enforcing exactly-one-match turns ambiguous edits into errors the model can recover from, rather than silent multi-file corruption.

How does an atomic write protect edit operations?

By writing the new content to a temp file on the same filesystem and then renaming the temp file over the original. The rename is a single filesystem operation that either succeeds completely or fails with the original file untouched. If the write is interrupted, the original file is not corrupted and the temp file can be cleaned up safely.

Should I include a preview or dry-run in the edit tool?

For high-stakes edits, yes. A edit_preview tool that returns a unified diff without writing lets the model verify its own change before committing. For quick scaffolding and low-risk edits, the extra tool call adds latency without much benefit. Most production agents include both and let the system prompt decide which to use.

When should I use line-range edits instead of string-match edits?

Only when the file has no stable unique substring around the target, like a JSON config with many similar entries. String-match edits are more reliable because they anchor on surrounding context that survives other edits. Line-range edits break the moment earlier edits shift line numbers. Default to string match, fall back to line range only when string match is ambiguous.

Key takeaways

The naive .replace(old, new) edit tool corrupts files 3 different ways. All 3 are easy to fix with a 50-line safe pattern.
Enforce uniqueness: the old string must appear exactly once in the file. Zero matches is a hallucinated change; multiple matches is ambiguous. Both are errors.
Write atomically: temp file on the same filesystem, then os.replace. Half-written files are a failure mode that disappears with this one pattern.
Return clear error messages that train the model to supply unique snippets on retry. The error messages are part of the API.
Default to string-match edits. Fall back to line-range only when string-match is genuinely ambiguous. String matches survive shifting line numbers.
To see this edit tool wired into a full coding agent with the event loop and the registry pattern, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

For the Python atomic-write pattern and the underlying os.replace semantics, see the Python os.replace documentation. The filesystem-level atomicity guarantees explained there are what make the safe edit pattern in this post work.

Your agent edited the wrong line and there is no undo

This post is the edit tool design I use in coding agents: uniqueness checks on the old string, atomic writes, dry-run previews, and the 5 rules that prevent the model from wrecking your working tree.

Why do naive edit tools corrupt files?

Because they trust the model's string match. The naive tool looks like this:

# filename: naive_edit.py
# description: The edit tool you should not ship. Replaces all matches,
# no uniqueness check, no backup, no atomic write.
def edit_file(path: str, old: str, new: str) -> str:
    with open(path) as fh:
        content = fh.read()
    with open(path, 'w') as fh:
        fh.write(content.replace(old, new))
    return f'edited {path}'

4 things wrong, each a real bug I have shipped:

.replace(old, new) with no count limit replaces every occurrence. The agent wanted to change one call site and changed them all.
No check that old appears in the file at all. A typo in the model's match string silently writes an unchanged file and reports success.
No check that old is unique. If it appears twice, the model does not know which one got edited.
Writing over the original file with no backup. If the process dies mid-write, you lose the whole file.

Fix all 4 and the edit tool becomes trustworthy. Skip any one and the agent will eventually corrupt something important.

graph TD
    Call[edit_file called] --> Read[Read file]
    Read --> Match[Find old string]
    Match -->|not found| Err1[Return error: not found]
    Match -->|multiple matches| Err2[Return error: ambiguous]
    Match -->|exactly one| Preview[Build preview]
    Preview --> Atomic[Write to temp file]
    Atomic --> Rename[Atomic rename]
    Rename --> Ok[Return diff]

    style Err1 fill:#fee2e2,stroke:#b91c1c
    style Err2 fill:#fee2e2,stroke:#b91c1c
    style Ok fill:#dcfce7,stroke:#15803d

5 steps, 2 early-exit errors, 1 atomic write. That is the whole safe pattern.

What is the string-match uniqueness rule?

# filename: safe_edit.py
# description: A safe edit tool with uniqueness check, atomic write,
# and a diff preview. 50 lines total.
import os
import tempfile
from pathlib import Path


def edit_file(path: str, old: str, new: str) -> dict:
    p = Path(path)
    if not p.exists():
        return {'ok': False, 'error': f'file not found: {path}'}

    content = p.read_text()

    if old not in content:
        return {'ok': False, 'error': f'old string not found in {path}'}

    occurrences = content.count(old)
    if occurrences > 1:
        return {
            'ok': False,
            'error': f'old string appears {occurrences} times in {path}; '
                     'provide more surrounding context so the match is unique',
        }

    new_content = content.replace(old, new, 1)

    tmp_fd, tmp_path = tempfile.mkstemp(dir=p.parent, prefix=f'.{p.name}.', suffix='.tmp')
    try:
        with os.fdopen(tmp_fd, 'w') as fh:
            fh.write(new_content)
        os.replace(tmp_path, p)
    except Exception:
        if os.path.exists(tmp_path):
            os.remove(tmp_path)
        raise

    return {
        'ok': True,
        'path': str(p),
        'bytes_before': len(content),
        'bytes_after': len(new_content),
    }

Why does atomic write matter for an agent?

How do you give the model a dry-run preview?

Pair edit_file with an edit_preview tool that returns the diff without writing. The model can call preview first to sanity-check its own change, then call edit to commit.

# filename: edit_preview.py
# description: Dry-run preview that returns a unified diff without writing.
# Same uniqueness check as the real edit, same error messages.
import difflib


def edit_preview(path: str, old: str, new: str) -> dict:
    p = Path(path)
    if not p.exists():
        return {'ok': False, 'error': f'file not found: {path}'}

    content = p.read_text()
    if old not in content:
        return {'ok': False, 'error': f'old string not found in {path}'}
    if content.count(old) > 1:
        return {'ok': False, 'error': f'old string not unique in {path}'}

    new_content = content.replace(old, new, 1)
    diff = list(difflib.unified_diff(
        content.splitlines(keepends=True),
        new_content.splitlines(keepends=True),
        fromfile=f'{path} (before)',
        tofile=f'{path} (after)',
        n=3,
    ))
    return {'ok': True, 'diff': ''.join(diff)}

How should the edit tool handle multi-line strings?

The one gotcha is line endings. A multi-line old string with \n line endings will not match a file with \r\n line endings. The safe pattern is to normalize line endings before matching:

# filename: normalize.py
# description: Normalize line endings to LF before matching and writing.
def normalize_le(text: str) -> str:
    return text.replace('\r\n', '\n').replace('\r', '\n')

When should you use line-range edits instead of string matches?

My rule: default to string-match edits. Fall back to line-range only when the file structure makes string matches impossible. In practice, string matches cover 95 percent of edits in a real codebase.

What to do Monday morning

Open your current edit tool. If it uses .replace(old, new) without a count limit, that is your biggest bug. Add the uniqueness check before anything else.
Add the atomic write pattern (tempfile plus os.replace). It is 5 lines and it eliminates the half-written-file failure mode forever.
Add clear error messages for "not found" and "ambiguous" cases. The messages train the model to supply unique snippets on the next attempt.
Consider adding edit_preview as a separate tool. Useful for high-stakes edits; the system prompt decides when to prefer it.
Add a test that writes a file with validate(token) appearing twice and confirms your edit tool refuses to edit without more context. Run it in CI forever.

The headline: a safe edit tool is 50 lines of uniqueness checks, atomic writes, and clear errors. Every line of that 50 catches a real bug. Cut any of them and you ship the bug.

The naive .replace(old, new) edit tool corrupts files 3 different ways. All 3 are easy to fix with a 50-line safe pattern.
Enforce uniqueness: the old string must appear exactly once in the file. Zero matches is a hallucinated change; multiple matches is ambiguous. Both are errors.
Write atomically: temp file on the same filesystem, then os.replace. Half-written files are a failure mode that disappears with this one pattern.
Return clear error messages that train the model to supply unique snippets on retry. The error messages are part of the API.
Default to string-match edits. Fall back to line-range only when string-match is genuinely ambiguous. String matches survive shifting line numbers.
To see this edit tool wired into a full coding agent with the event loop and the registry pattern, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

Edit tool design: how coding agents modify files safely

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

Edit tool design: how coding agents modify files safely

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?