Build a coding agent with Claude in 100 lines of Python

The first coding agent you build should fit in 100 lines

Most people learn coding agents the wrong way around. They install a framework, run a demo, and end up with something that "works" but they cannot debug. Then they hit a wall the first time they want to change the loop, the tools, or the prompt, because all of those things are buried under abstractions.

The right way to learn is to write the whole thing yourself first. The total surface area is small. A coding agent is one LLM client, 2 or 3 tool functions, a while loop, and a system prompt. That fits in 100 lines of Python and handles real tasks. Once you have built that, every framework starts looking like glue you no longer need.

This post is a step-by-step build of a Claude coding agent from scratch. By the end you will have a script that can read your repo, edit files, run shell commands, and stop cleanly when the task is done. We will use the Anthropic SDK directly. No LangChain, no LlamaIndex, no Agent framework. Just Python, the loop, and the model.

Why build a coding agent from scratch instead of using a framework?

3 reasons, in increasing order of importance.

The first is debuggability. When your agent does something weird (and it will), you need to be able to set a breakpoint inside the loop, inspect the message history, see the exact tool call the model made, and figure out why. Frameworks add layers between you and that visibility. From-scratch agents are one stack frame deep.

The second is customization. The interesting work in coding agents is in the system prompt and the tool design, not the loop. Frameworks abstract the loop and force opinions on the prompt and tools. Building from scratch lets you change either without fighting an inheritance hierarchy.

The third is mental model. After you write the loop once, every coding agent paper, blog post, and codebase becomes legible. You can read the Aider source in an hour, the Cursor agent mode posts make sense, the OpenDevin design doc clicks. The 100 lines of code are the Rosetta Stone for the entire field.

What will the agent be able to do?

By the end of this post the agent can do 4 things:

graph TD
    User["User: fix the failing test"] --> Loop[Agent Loop]
    Loop -->|read_file| Read[Inspect source]
    Loop -->|edit_file| Edit[Apply a patch]
    Loop -->|run_bash| Bash[Run pytest, ls, grep]
    Loop -->|reply| Done[Final answer]

    Read --> Loop
    Edit --> Loop
    Bash --> Loop

    style Loop fill:#dbeafe,stroke:#1e40af
    style Done fill:#dcfce7,stroke:#15803d

4 tools, one loop, and a system prompt. That is the entire surface area. We will build it in 5 steps.

Step 1: the skeleton (LLM call, no tools)

Start with the dumbest possible version: send a message, get a reply, print it.

# filename: agent_step1.py
# description: A chatbot, not yet an agent. One call, one reply.
from anthropic import Anthropic

client = Anthropic()

def run(user_message: str) -> str:
    reply = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=2048,
        messages=[{'role': 'user', 'content': user_message}],
    )
    return reply.content[0].text


if __name__ == '__main__':
    print(run('Explain what a closure is in Python in 2 sentences.'))

This is not yet an agent. It is the floor. If you cannot get this working, fix your API key before moving on.

Step 2: add the first tool

Add a read_file tool, declare it in the tools list, and handle the case where the model asks to use it.

# filename: agent_step2.py
# description: First real agent step. Model can ask to read a file,
# the runtime executes it, results go back into the next call.
from anthropic import Anthropic

client = Anthropic()

TOOLS = [{
    'name': 'read_file',
    'description': 'Read a text file from the current working directory and return its contents.',
    'input_schema': {
        'type': 'object',
        'properties': {'path': {'type': 'string'}},
        'required': ['path'],
    },
}]


def read_file(path: str) -> str:
    with open(path) as fh:
        return fh.read()


def run(user_message: str) -> str:
    messages = [{'role': 'user', 'content': user_message}]

    while True:
        reply = client.messages.create(
            model='claude-sonnet-4-6',
            max_tokens=2048,
            tools=TOOLS,
            messages=messages,
        )

        if reply.stop_reason == 'end_turn':
            return next(b.text for b in reply.content if b.type == 'text')

        messages.append({'role': 'assistant', 'content': reply.content})

        results = []
        for block in reply.content:
            if block.type == 'tool_use' and block.name == 'read_file':
                output = read_file(**block.input)
                results.append({
                    'type': 'tool_result',
                    'tool_use_id': block.id,
                    'content': output,
                })

        messages.append({'role': 'user', 'content': results})

You now have an agent. Run it with run('Read README.md and tell me what this project does'). Watch it call the tool, get the file back, and produce a summary. The loop is the same one we walked through in The Event Loop Inside a Coding Agent post; this is that mental model in working code.

Step 3: add the edit and bash tools

A real coding agent needs to write, not just read. Add edit_file and run_bash to the tool list. Keep the loop unchanged; just add more dispatch branches.

# filename: agent_step3.py
# description: Add edit_file and run_bash. The loop does not change.
import subprocess

TOOLS = [
    # ... read_file from step 2 ...
    {
        'name': 'edit_file',
        'description': 'Replace a substring in a file with a new substring. Use for targeted edits.',
        'input_schema': {
            'type': 'object',
            'properties': {
                'path': {'type': 'string'},
                'old': {'type': 'string'},
                'new': {'type': 'string'},
            },
            'required': ['path', 'old', 'new'],
        },
    },
    {
        'name': 'run_bash',
        'description': 'Run a shell command and return stdout and stderr. Use for tests, listing files, grep.',
        'input_schema': {
            'type': 'object',
            'properties': {'command': {'type': 'string'}},
            'required': ['command'],
        },
    },
]


def edit_file(path: str, old: str, new: str) -> str:
    with open(path) as fh:
        contents = fh.read()
    if old not in contents:
        return f'ERROR: substring not found in {path}'
    with open(path, 'w') as fh:
        fh.write(contents.replace(old, new, 1))
    return f'Edited {path}'


def run_bash(command: str) -> str:
    result = subprocess.run(
        command, shell=True, capture_output=True, text=True, timeout=30,
    )
    return f'STDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}'

The dispatch in the loop becomes a small if/elif. That is the only change.

# filename: agent_step3_dispatch.py
# description: The dispatch block inside the loop, extended for 3 tools.
TOOL_HANDLERS = {
    'read_file': read_file,
    'edit_file': edit_file,
    'run_bash': run_bash,
}

for block in reply.content:
    if block.type == 'tool_use':
        handler = TOOL_HANDLERS[block.name]
        output = handler(**block.input)
        results.append({
            'type': 'tool_result',
            'tool_use_id': block.id,
            'content': output,
        })

3 tools, one dispatch table. Adding a fourth (glob, grep, git_diff) is the same pattern.

Step 4: how do you teach the agent to plan with a system prompt?

Without a system prompt, the model will jump straight into editing files when you ask it to fix a bug. With one, it will read the relevant files first, form a plan, and only then edit. The system prompt is where most of the agent's apparent intelligence lives.

# filename: system_prompt.py
# description: A short system prompt that nudges the model toward
# read-first, edit-second, test-after behavior.
SYSTEM_PROMPT = '''
You are a coding agent. You have 4 tools: read_file, edit_file, run_bash, and reply to the user.

Rules you follow on every task:
1. Before editing any file, read it with read_file. Never edit blind.
2. Before claiming a fix works, run the tests with run_bash. If they fail, read the error and try again.
3. Make one logical change per edit_file call. Many small edits beat one big one.
4. When the task is done, reply to the user with a one-paragraph summary of what changed and why. Do not call any more tools.

If you are unsure what file to edit, use run_bash to grep or list files first.
'''

Pass it as the system argument to client.messages.create. That is the entire change. Re-run the agent with a real task like "the test in tests/test_auth.py is failing, please fix it" and watch the difference. With the prompt, it reads the test first, then the source, then edits, then runs the test. Without it, it guesses.

For a deeper walk-through of how prompts shape agent behavior, the Build Your Own Coding Agent course covers prompt iteration, planning prompts, and reflection patterns module by module. The free AI Agents Fundamentals resource is a good prerequisite if the loop concept is still new.

Step 5: how do you stop a coding agent from running forever?

The final piece is preventing the agent from running forever or doing something destructive. 3 rails, each one a few lines.

# filename: rails.py
# description: Step ceiling, repeat detection, and a denylist for the bash tool.
MAX_STEPS = 25
DANGEROUS_PATTERNS = ['rm -rf /', ':(){:|:&};:', '> /dev/sda', 'mkfs', 'dd if=']


def is_safe(command: str) -> bool:
    return not any(pat in command for pat in DANGEROUS_PATTERNS)


def run(user_message: str) -> str:
    messages = [{'role': 'user', 'content': user_message}]
    last_observation = None

    for step in range(MAX_STEPS):
        reply = client.messages.create(
            model='claude-sonnet-4-6',
            max_tokens=2048,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages,
        )

        if reply.stop_reason == 'end_turn':
            return next(b.text for b in reply.content if b.type == 'text')

        messages.append({'role': 'assistant', 'content': reply.content})

        results = []
        for block in reply.content:
            if block.type == 'tool_use':
                if block.name == 'run_bash' and not is_safe(block.input['command']):
                    output = 'ERROR: command blocked by safety filter'
                else:
                    output = TOOL_HANDLERS[block.name](**block.input)
                results.append({'type': 'tool_result', 'tool_use_id': block.id, 'content': output})

        if results == last_observation:
            return 'Loop detected. Stopping early.'
        last_observation = results
        messages.append({'role': 'user', 'content': results})

    return f'Hit max steps ({MAX_STEPS}). Aborting.'

That is the complete agent. Read it end to end. It is roughly 100 lines including all the tool definitions. It can read, edit, and execute. It plans before acting. It stops when finished or when something is wrong.

You are done. You built a coding agent.

What to do Monday morning

Type out the script from this post by hand. Do not paste. The point is to feel how few lines there are.
Run it on a real task in a throwaway repo. "Add a --verbose flag to the CLI" or "the test on line 42 is failing, fix it." Watch the loop print every tool call.
Add a fourth tool: glob, grep, or git_diff. The dispatch table stays the same. Notice that the loop never changes when you add tools.
Read the system prompt out loud. If you do not believe it would change the model's behavior, run the agent twice (with and without it) on the same task. The difference is the entire game.
Once it works, then go look at LangGraph or your favorite framework. It will make sense in a way it never did before, because you now know exactly what they are abstracting.

You do not need a framework to ship a coding agent. You need a clear mental model and a small, honest loop. Everything else is decoration on those 100 lines.

Frequently asked questions

How do you build a coding agent with the Claude API?

Use the Anthropic Python SDK, declare your tools in the tools argument of messages.create, and wrap the call in a while loop. When the model returns a tool_use block, execute the corresponding Python function and append the result as a tool_result to the message history. Keep looping until the model returns a response with no tool calls. The whole pattern fits in about 50 lines.

What tools should a coding agent have?

At minimum: read_file, edit_file, and run_bash. These 3 cover the majority of real coding tasks because they let the agent inspect, modify, and verify. A fourth useful tool is glob or grep for navigating large repos. More tools beyond that add power but also raise the chance the model picks the wrong one. Start with 3.

Do I need LangChain or LlamaIndex to build a coding agent?

No. Frameworks add convenience but they hide the loop, which is the part you most need to understand. A coding agent built directly against the Anthropic SDK is roughly 100 lines, fully debuggable, and easy to customize. Build the from-scratch version first. Add a framework only when you have a concrete need it solves and you know exactly what you are giving up.

How does Claude know which tool to call?

The model receives a list of tools with names, descriptions, and JSON input schemas in every API call. When it produces a response, it can include tool_use blocks that specify which tool to call and what arguments to pass. The model decides based on the tool descriptions, the system prompt, and the user message. This is why writing clear tool descriptions matters more than people expect.

How do I stop a coding agent from running forever?

3 checks: a maximum step count (start at 25), a comparison of the latest tool result with the previous one to detect repeat loops, and the model's own stop_reason: end_turn signal for the happy path. Without all 3, an agent can absolutely chew through your token budget by reading the same file in 100 different ways. The rails are 10 lines of code and they are not optional.

Key takeaways

A coding agent is one LLM client, 3 tools, a while loop, and a system prompt. That fits in 100 lines of Python.
Build the from-scratch version before reaching for a framework. Once you understand the loop, every framework looks like glue you no longer need.
The intelligence is in the tool descriptions and the system prompt, not the loop. Spend your effort there.
Read first, edit second, test after. A short system prompt teaches the model that pattern and changes the agent's behavior more than any code change you can make.
Termination rails are not optional. Step ceiling, repeat detection, and a bash denylist together keep an agent bounded and safe.
To extend this into a richer agent with planning, reflection, and longer task budgets, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

The official Anthropic tool use documentation is the source of truth for the API surface this post uses. Bookmark it. Everything else (frameworks, tutorials, agent papers) builds on top of those primitives.

The first coding agent you build should fit in 100 lines

Why build a coding agent from scratch instead of using a framework?

3 reasons, in increasing order of importance.

What will the agent be able to do?

By the end of this post the agent can do 4 things:

graph TD
    User["User: fix the failing test"] --> Loop[Agent Loop]
    Loop -->|read_file| Read[Inspect source]
    Loop -->|edit_file| Edit[Apply a patch]
    Loop -->|run_bash| Bash[Run pytest, ls, grep]
    Loop -->|reply| Done[Final answer]

    Read --> Loop
    Edit --> Loop
    Bash --> Loop

    style Loop fill:#dbeafe,stroke:#1e40af
    style Done fill:#dcfce7,stroke:#15803d

4 tools, one loop, and a system prompt. That is the entire surface area. We will build it in 5 steps.

Step 1: the skeleton (LLM call, no tools)

Start with the dumbest possible version: send a message, get a reply, print it.

# filename: agent_step1.py
# description: A chatbot, not yet an agent. One call, one reply.
from anthropic import Anthropic

client = Anthropic()

def run(user_message: str) -> str:
    reply = client.messages.create(
        model='claude-sonnet-4-6',
        max_tokens=2048,
        messages=[{'role': 'user', 'content': user_message}],
    )
    return reply.content[0].text


if __name__ == '__main__':
    print(run('Explain what a closure is in Python in 2 sentences.'))

This is not yet an agent. It is the floor. If you cannot get this working, fix your API key before moving on.

Step 2: add the first tool

Add a read_file tool, declare it in the tools list, and handle the case where the model asks to use it.

# filename: agent_step2.py
# description: First real agent step. Model can ask to read a file,
# the runtime executes it, results go back into the next call.
from anthropic import Anthropic

client = Anthropic()

TOOLS = [{
    'name': 'read_file',
    'description': 'Read a text file from the current working directory and return its contents.',
    'input_schema': {
        'type': 'object',
        'properties': {'path': {'type': 'string'}},
        'required': ['path'],
    },
}]


def read_file(path: str) -> str:
    with open(path) as fh:
        return fh.read()


def run(user_message: str) -> str:
    messages = [{'role': 'user', 'content': user_message}]

    while True:
        reply = client.messages.create(
            model='claude-sonnet-4-6',
            max_tokens=2048,
            tools=TOOLS,
            messages=messages,
        )

        if reply.stop_reason == 'end_turn':
            return next(b.text for b in reply.content if b.type == 'text')

        messages.append({'role': 'assistant', 'content': reply.content})

        results = []
        for block in reply.content:
            if block.type == 'tool_use' and block.name == 'read_file':
                output = read_file(**block.input)
                results.append({
                    'type': 'tool_result',
                    'tool_use_id': block.id,
                    'content': output,
                })

        messages.append({'role': 'user', 'content': results})

Step 3: add the edit and bash tools

A real coding agent needs to write, not just read. Add edit_file and run_bash to the tool list. Keep the loop unchanged; just add more dispatch branches.

# filename: agent_step3.py
# description: Add edit_file and run_bash. The loop does not change.
import subprocess

TOOLS = [
    # ... read_file from step 2 ...
    {
        'name': 'edit_file',
        'description': 'Replace a substring in a file with a new substring. Use for targeted edits.',
        'input_schema': {
            'type': 'object',
            'properties': {
                'path': {'type': 'string'},
                'old': {'type': 'string'},
                'new': {'type': 'string'},
            },
            'required': ['path', 'old', 'new'],
        },
    },
    {
        'name': 'run_bash',
        'description': 'Run a shell command and return stdout and stderr. Use for tests, listing files, grep.',
        'input_schema': {
            'type': 'object',
            'properties': {'command': {'type': 'string'}},
            'required': ['command'],
        },
    },
]


def edit_file(path: str, old: str, new: str) -> str:
    with open(path) as fh:
        contents = fh.read()
    if old not in contents:
        return f'ERROR: substring not found in {path}'
    with open(path, 'w') as fh:
        fh.write(contents.replace(old, new, 1))
    return f'Edited {path}'


def run_bash(command: str) -> str:
    result = subprocess.run(
        command, shell=True, capture_output=True, text=True, timeout=30,
    )
    return f'STDOUT:\n{result.stdout}\nSTDERR:\n{result.stderr}'

The dispatch in the loop becomes a small if/elif. That is the only change.

# filename: agent_step3_dispatch.py
# description: The dispatch block inside the loop, extended for 3 tools.
TOOL_HANDLERS = {
    'read_file': read_file,
    'edit_file': edit_file,
    'run_bash': run_bash,
}

for block in reply.content:
    if block.type == 'tool_use':
        handler = TOOL_HANDLERS[block.name]
        output = handler(**block.input)
        results.append({
            'type': 'tool_result',
            'tool_use_id': block.id,
            'content': output,
        })

3 tools, one dispatch table. Adding a fourth (glob, grep, git_diff) is the same pattern.

Step 4: how do you teach the agent to plan with a system prompt?

# filename: system_prompt.py
# description: A short system prompt that nudges the model toward
# read-first, edit-second, test-after behavior.
SYSTEM_PROMPT = '''
You are a coding agent. You have 4 tools: read_file, edit_file, run_bash, and reply to the user.

Rules you follow on every task:
1. Before editing any file, read it with read_file. Never edit blind.
2. Before claiming a fix works, run the tests with run_bash. If they fail, read the error and try again.
3. Make one logical change per edit_file call. Many small edits beat one big one.
4. When the task is done, reply to the user with a one-paragraph summary of what changed and why. Do not call any more tools.

If you are unsure what file to edit, use run_bash to grep or list files first.
'''

Step 5: how do you stop a coding agent from running forever?

The final piece is preventing the agent from running forever or doing something destructive. 3 rails, each one a few lines.

# filename: rails.py
# description: Step ceiling, repeat detection, and a denylist for the bash tool.
MAX_STEPS = 25
DANGEROUS_PATTERNS = ['rm -rf /', ':(){:|:&};:', '> /dev/sda', 'mkfs', 'dd if=']


def is_safe(command: str) -> bool:
    return not any(pat in command for pat in DANGEROUS_PATTERNS)


def run(user_message: str) -> str:
    messages = [{'role': 'user', 'content': user_message}]
    last_observation = None

    for step in range(MAX_STEPS):
        reply = client.messages.create(
            model='claude-sonnet-4-6',
            max_tokens=2048,
            system=SYSTEM_PROMPT,
            tools=TOOLS,
            messages=messages,
        )

        if reply.stop_reason == 'end_turn':
            return next(b.text for b in reply.content if b.type == 'text')

        messages.append({'role': 'assistant', 'content': reply.content})

        results = []
        for block in reply.content:
            if block.type == 'tool_use':
                if block.name == 'run_bash' and not is_safe(block.input['command']):
                    output = 'ERROR: command blocked by safety filter'
                else:
                    output = TOOL_HANDLERS[block.name](**block.input)
                results.append({'type': 'tool_result', 'tool_use_id': block.id, 'content': output})

        if results == last_observation:
            return 'Loop detected. Stopping early.'
        last_observation = results
        messages.append({'role': 'user', 'content': results})

    return f'Hit max steps ({MAX_STEPS}). Aborting.'

You are done. You built a coding agent.

What to do Monday morning

Type out the script from this post by hand. Do not paste. The point is to feel how few lines there are.
Run it on a real task in a throwaway repo. "Add a --verbose flag to the CLI" or "the test on line 42 is failing, fix it." Watch the loop print every tool call.
Add a fourth tool: glob, grep, or git_diff. The dispatch table stays the same. Notice that the loop never changes when you add tools.
Read the system prompt out loud. If you do not believe it would change the model's behavior, run the agent twice (with and without it) on the same task. The difference is the entire game.
Once it works, then go look at LangGraph or your favorite framework. It will make sense in a way it never did before, because you now know exactly what they are abstracting.

You do not need a framework to ship a coding agent. You need a clear mental model and a small, honest loop. Everything else is decoration on those 100 lines.

A coding agent is one LLM client, 3 tools, a while loop, and a system prompt. That fits in 100 lines of Python.
Build the from-scratch version before reaching for a framework. Once you understand the loop, every framework looks like glue you no longer need.
The intelligence is in the tool descriptions and the system prompt, not the loop. Spend your effort there.
Read first, edit second, test after. A short system prompt teaches the model that pattern and changes the agent's behavior more than any code change you can make.
Termination rails are not optional. Step ceiling, repeat detection, and a bash denylist together keep an agent bounded and safe.
To extend this into a richer agent with planning, reflection, and longer task budgets, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

Build a coding agent with Claude: a step-by-step guide

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

Build a coding agent with Claude: a step-by-step guide

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?