Coding agent loop explained: how Claude Code thinks

A coding agent is just a `while` loop you have not written yet

Open up Claude Code, Cursor's agent mode, or Aider. Watch one work for 30 seconds. It reads a file, edits it, runs the tests, sees the failure, edits again, runs the tests again, and stops when they pass. It feels alive.

It is not. It is a while loop wrapped around an LLM call. The "magic" is the loop, not the model. Once you see the loop, every coding agent in the world stops looking like a black box and starts looking like 10 lines of code you can write yourself.

This post is the mental model I wish I had on day one. We will look at why a chatbot cannot be a coding agent, what the 4 states of an agent loop are, how termination actually works, and the smallest piece of Python that captures the whole thing. By the end you will be able to read any agent codebase and find the loop in under a minute.

Why a chatbot can never be a coding agent

A chatbot takes a message, returns a message, and forgets. One round trip. The model produces text and the conversation moves on. That works for "what is a closure in JavaScript." It does not work for "fix the failing test in auth.py."

To fix the failing test, the model needs to read the file, see the error, decide what to change, write the change, and re-run the test. None of those steps are text generation. Each one is an action against the outside world. A pure chatbot has no way to take an action and then continue thinking with the result in front of it.

The thing that closes the gap is a loop. Every turn, the model gets to either produce text for the user or ask for an action. When it asks for an action, the runtime executes it, captures the result, appends it to the conversation, and asks the model to think again with that new information in context. The model never moved. The runtime around it did.

That runtime is the agent.

The 4 states every agent loop cycles through

Strip away the framework, the prompt engineering, and the streaming UI, and every coding agent on the market reduces to 4 states. They map directly to what you see scrolling past in your terminal.

graph TD
    Start([User sends task]) --> Think[State 1: Think]
    Think -->|model returns text only| Done([State 4: Done])
    Think -->|model returns tool call| Act[State 2: Act]
    Act --> Observe[State 3: Observe]
    Observe -->|append result to messages| Think

    style Think fill:#dbeafe,stroke:#1e40af
    style Act fill:#fef3c7,stroke:#b45309
    style Observe fill:#dcfce7,stroke:#15803d
    style Done fill:#e5e7eb,stroke:#374151

State 1, Think. The model receives the full message history and produces a response. The response is either a final answer for the user or a request to call a tool. Nothing else.

State 2, Act. The runtime sees the tool request, validates it against the registered tools, and runs it. This is where read_file, run_bash, edit_file, or grep actually execute.

State 3, Observe. The result of the tool call (file contents, command stdout, an error trace) is wrapped as a message and appended to the history.

State 4, Done. The model returns a turn with no tool calls. The loop exits and the user sees the final answer.

That is the entire model. Everything else (reflection prompts, planning steps, multi-agent handoffs, sub-agents) is a fancier way of organizing the same 4 states.

What does the loop actually look like in code?

Here is the smallest honest agent loop in Python. It is roughly 30 lines, it works against any LLM that supports tool calling, and it captures every behavior you saw above.

# filename: mini_agent.py
# description: The entire mental model of a coding agent in one function.
# Reads a file when asked, otherwise just answers the user.
from anthropic import Anthropic

client = Anthropic()

TOOLS = [{
    'name': 'read_file',
    'description': 'Read a text file from disk and return its contents.',
    'input_schema': {
        'type': 'object',
        'properties': {'path': {'type': 'string'}},
        'required': ['path'],
    },
}]

def read_file(path: str) -> str:
    with open(path) as fh:
        return fh.read()

def run_agent(user_message: str) -> str:
    messages = [{'role': 'user', 'content': user_message}]

    while True:
        reply = client.messages.create(
            model='claude-sonnet-4-6',
            max_tokens=2048,
            tools=TOOLS,
            messages=messages,
        )

        # State 4: Done. Model produced no tool call.
        if reply.stop_reason == 'end_turn':
            return reply.content[0].text

        # State 1 -> State 2 -> State 3
        messages.append({'role': 'assistant', 'content': reply.content})

        tool_results = []
        for block in reply.content:
            if block.type == 'tool_use':
                output = read_file(**block.input)  # State 2: Act
                tool_results.append({
                    'type': 'tool_result',
                    'tool_use_id': block.id,
                    'content': output,
                })

        messages.append({'role': 'user', 'content': tool_results})  # State 3: Observe

That is it. Replace read_file with a dictionary of real tools, add a few more entries to TOOLS, and you have something that can navigate a real codebase. The leap from this to Claude Code is engineering effort, not architectural insight.

If you want the full version with file editing, bash execution, error recovery, and a planning prompt, the Build Your Own Coding Agent course walks through it module by module. The free AI Agents Fundamentals resource is a good starting point if you want the conceptual model before touching the SDK.

How does Claude Code work under the hood?

The same way the snippet above works, with 3 additions: a richer tool set, a system prompt that teaches the model how to plan, and safety wrappers around dangerous tools.

The richer tool set is the obvious part. Claude Code exposes read_file, write_file, edit_file, run_bash, glob, grep, and a handful of others. Each one is a Python function on the runtime side and a JSON schema on the model side. The loop does not change. There are just more if block.type == 'tool_use' branches to dispatch to.

The system prompt is the part that surprises people. A coding agent's system prompt runs 5000 to 10000 tokens. It tells the model how to break a task into steps, when to read before writing, how to format file edits, and when to stop and ask the user. None of this is in the loop. It is in the first message.

The safety wrappers are the boring but essential part. Before a run_bash tool actually executes, the runtime checks whether the command needs human approval, whether it touches a protected path, whether it matches a denylist. The model can request anything. The runtime decides what is allowed. This split is the only thing standing between an agent and a deleted home directory.

For a deeper system-level walkthrough of how agents fit into a real production stack, the System Design: Building a Production-Ready AI Chatbot post shows where the loop sits relative to streaming, memory, and persistence layers.

When does the loop stop?

This is the question that breaks beginner agents. If you do not pick clear termination rules, your loop will spin forever, burn tokens, and get killed by a timeout instead of finishing cleanly.

3 termination conditions every loop should check:

The model returned no tool call (stop_reason == 'end_turn'). This is the happy path. The model is done thinking.
The loop has run more than N iterations. Pick a number. I usually start at 25 for simple tasks and 100 for full coding sessions. Above that, something is wrong and you want to bail loud, not loop forever.
The tool result is identical to the previous one. If the model just read the same file twice in a row, it is stuck in a loop. Break out and ask for help.

# filename: termination.py
# description: The 3 checks that prevent runaway agent loops.
MAX_STEPS = 25
last_observation = None

for step in range(MAX_STEPS):
    reply = client.messages.create(...)
    if reply.stop_reason == 'end_turn':
        return reply.content[0].text  # Condition 1

    observation = run_tool(reply)
    if observation == last_observation:
        return 'Loop detected. Stopping early.'  # Condition 3
    last_observation = observation
else:
    return f'Hit max steps ({MAX_STEPS}). Aborting.'  # Condition 2

This is the part that differentiates a toy agent from one you would let run unattended. Without these checks, the model can absolutely chew through your monthly token budget by reading the same file in 100 different ways.

How do AI coding agents work differently from ReAct agents?

ReAct (Reason + Act) is the academic ancestor of every coding agent today. The difference is mostly cosmetic at this point.

Original ReAct from the 2022 paper used a special prompt format where the model output Thought: ... Action: ... Observation: ... as parsed text. The runtime regex-extracted the action and ran it. It worked, but it was fragile. One stray colon and the parser broke.

Modern coding agents use the model's native tool-calling API instead. The model returns a structured tool_use block, the runtime dispatches it without parsing prose, and the result comes back as a structured tool_result. Same loop, sturdier transport.

The other difference is that ReAct papers focused on single-step reasoning. Real coding agents extend the loop to dozens of steps and add sub-agents, planning prompts, and self-reflection. But strip those layers off and you are back at the 4 states above.

What to do Monday morning

A short, do-this-now list:

Open mini_agent.py from this post in a new file. Run it with one tool. Watch the loop run. Print every message that goes into the model so you can see the state grow.
Add a second tool (run_bash or list_dir) and a task that requires both. Notice how the model decides which to call and how the runtime never picks for it.
Add the 3 termination checks. Force a runaway by giving the model an impossible task and confirm your loop bails out instead of spinning.
Read your favorite agent's source code (Aider, Continue, OpenDevin) and find the while loop. It is always there and always shorter than you expect.
Resist the urge to add a framework. Agents are 50 lines of glue code and 1000 lines of prompt and tools. Frameworks hide the glue, which is the part you most need to understand.

The mental model you take away is: the model thinks, the runtime acts and observes, and the loop keeps going until the model says it is done. Anything else you read about agents is decoration on that core.

Frequently asked questions

What is an agent loop?

An agent loop is the runtime cycle that turns a single LLM call into multi-step reasoning. The model produces a response, the runtime executes any tool calls in that response, appends the results to the message history, and calls the model again with the updated history. The cycle ends when the model returns a response with no tool calls or a termination rule fires.

How does Claude Code work under the hood?

Claude Code is an agent loop wrapped around the Anthropic API with a rich tool set (read_file, edit_file, run_bash, glob, and others), a long system prompt that teaches planning behavior, and safety wrappers that gate dangerous tools behind user approval. The loop itself is roughly 50 lines of code. The intelligence comes from the tool design and the prompt, not from the loop.

What is the difference between an agent loop and a chatbot?

A chatbot performs one LLM call per user message and returns the result. An agent loop performs many LLM calls per user message, executing tools between calls and feeding the results back into the next call. The chatbot has no way to act on the world; the agent loop does. That single addition is what enables coding, browsing, debugging, and every other real task agents handle.

How do AI coding agents avoid infinite loops?

3 mechanisms. First, a maximum step count (typically 25 to 100) that hard-stops the loop. Second, repeated-observation detection that exits when the model gets the same tool result twice in a row. Third, the model's own stop_reason: end_turn signal, which fires when it produces a response with no tool calls. Without all 3, an agent will eventually spin and burn tokens.

Is a ReAct agent the same as a coding agent?

They share the same core loop (think, act, observe), but ReAct used parsed prose output (Thought:, Action:, Observation:) while modern coding agents use native tool-calling APIs that return structured blocks. Coding agents also extend the loop with planning prompts, sub-agents, and longer step budgets. ReAct is the ancestor; current coding agents are a sturdier, deeper version of the same idea.

Key takeaways

A coding agent is a while loop wrapped around an LLM call. The model thinks, the runtime acts, the runtime observes, and the cycle repeats until the model says it is done.
Every agent reduces to 4 states: Think, Act, Observe, Done. Once you see them you can read any agent codebase in a minute.
The loop itself is roughly 30 lines of Python. The intelligence is in the tool set and the system prompt, not the loop.
Termination matters more than people think. Always set a max-step ceiling, watch for repeated observations, and trust stop_reason for the happy path.
Modern coding agents are ReAct with a sturdier transport. Tool-calling APIs replaced regex parsing of Thought: ... Action: prose, but the underlying loop is unchanged.
To turn this mental model into a working agent with file edits, bash execution, and recovery, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer to lock in the conceptual basics.

For the original ReAct framing that all of this builds on, see the ReAct paper from Yao et al.. The vocabulary has evolved but the loop they described is still what you are running every time you use a coding agent.

A coding agent is just a `while` loop you have not written yet

Why a chatbot can never be a coding agent

That runtime is the agent.

The 4 states every agent loop cycles through

Strip away the framework, the prompt engineering, and the streaming UI, and every coding agent on the market reduces to 4 states. They map directly to what you see scrolling past in your terminal.

graph TD
    Start([User sends task]) --> Think[State 1: Think]
    Think -->|model returns text only| Done([State 4: Done])
    Think -->|model returns tool call| Act[State 2: Act]
    Act --> Observe[State 3: Observe]
    Observe -->|append result to messages| Think

    style Think fill:#dbeafe,stroke:#1e40af
    style Act fill:#fef3c7,stroke:#b45309
    style Observe fill:#dcfce7,stroke:#15803d
    style Done fill:#e5e7eb,stroke:#374151

State 1, Think. The model receives the full message history and produces a response. The response is either a final answer for the user or a request to call a tool. Nothing else.

State 2, Act. The runtime sees the tool request, validates it against the registered tools, and runs it. This is where read_file, run_bash, edit_file, or grep actually execute.

State 3, Observe. The result of the tool call (file contents, command stdout, an error trace) is wrapped as a message and appended to the history.

State 4, Done. The model returns a turn with no tool calls. The loop exits and the user sees the final answer.

That is the entire model. Everything else (reflection prompts, planning steps, multi-agent handoffs, sub-agents) is a fancier way of organizing the same 4 states.

What does the loop actually look like in code?

Here is the smallest honest agent loop in Python. It is roughly 30 lines, it works against any LLM that supports tool calling, and it captures every behavior you saw above.

# filename: mini_agent.py
# description: The entire mental model of a coding agent in one function.
# Reads a file when asked, otherwise just answers the user.
from anthropic import Anthropic

client = Anthropic()

TOOLS = [{
    'name': 'read_file',
    'description': 'Read a text file from disk and return its contents.',
    'input_schema': {
        'type': 'object',
        'properties': {'path': {'type': 'string'}},
        'required': ['path'],
    },
}]

def read_file(path: str) -> str:
    with open(path) as fh:
        return fh.read()

def run_agent(user_message: str) -> str:
    messages = [{'role': 'user', 'content': user_message}]

    while True:
        reply = client.messages.create(
            model='claude-sonnet-4-6',
            max_tokens=2048,
            tools=TOOLS,
            messages=messages,
        )

        # State 4: Done. Model produced no tool call.
        if reply.stop_reason == 'end_turn':
            return reply.content[0].text

        # State 1 -> State 2 -> State 3
        messages.append({'role': 'assistant', 'content': reply.content})

        tool_results = []
        for block in reply.content:
            if block.type == 'tool_use':
                output = read_file(**block.input)  # State 2: Act
                tool_results.append({
                    'type': 'tool_result',
                    'tool_use_id': block.id,
                    'content': output,
                })

        messages.append({'role': 'user', 'content': tool_results})  # State 3: Observe

How does Claude Code work under the hood?

The same way the snippet above works, with 3 additions: a richer tool set, a system prompt that teaches the model how to plan, and safety wrappers around dangerous tools.

When does the loop stop?

This is the question that breaks beginner agents. If you do not pick clear termination rules, your loop will spin forever, burn tokens, and get killed by a timeout instead of finishing cleanly.

3 termination conditions every loop should check:

The model returned no tool call (stop_reason == 'end_turn'). This is the happy path. The model is done thinking.
The loop has run more than N iterations. Pick a number. I usually start at 25 for simple tasks and 100 for full coding sessions. Above that, something is wrong and you want to bail loud, not loop forever.
The tool result is identical to the previous one. If the model just read the same file twice in a row, it is stuck in a loop. Break out and ask for help.

# filename: termination.py
# description: The 3 checks that prevent runaway agent loops.
MAX_STEPS = 25
last_observation = None

for step in range(MAX_STEPS):
    reply = client.messages.create(...)
    if reply.stop_reason == 'end_turn':
        return reply.content[0].text  # Condition 1

    observation = run_tool(reply)
    if observation == last_observation:
        return 'Loop detected. Stopping early.'  # Condition 3
    last_observation = observation
else:
    return f'Hit max steps ({MAX_STEPS}). Aborting.'  # Condition 2

How do AI coding agents work differently from ReAct agents?

ReAct (Reason + Act) is the academic ancestor of every coding agent today. The difference is mostly cosmetic at this point.

What to do Monday morning

A short, do-this-now list:

Open mini_agent.py from this post in a new file. Run it with one tool. Watch the loop run. Print every message that goes into the model so you can see the state grow.
Add a second tool (run_bash or list_dir) and a task that requires both. Notice how the model decides which to call and how the runtime never picks for it.
Add the 3 termination checks. Force a runaway by giving the model an impossible task and confirm your loop bails out instead of spinning.
Read your favorite agent's source code (Aider, Continue, OpenDevin) and find the while loop. It is always there and always shorter than you expect.
Resist the urge to add a framework. Agents are 50 lines of glue code and 1000 lines of prompt and tools. Frameworks hide the glue, which is the part you most need to understand.

A coding agent is a while loop wrapped around an LLM call. The model thinks, the runtime acts, the runtime observes, and the cycle repeats until the model says it is done.
Every agent reduces to 4 states: Think, Act, Observe, Done. Once you see them you can read any agent codebase in a minute.
The loop itself is roughly 30 lines of Python. The intelligence is in the tool set and the system prompt, not the loop.
Termination matters more than people think. Always set a max-step ceiling, watch for repeated observations, and trust stop_reason for the happy path.
Modern coding agents are ReAct with a sturdier transport. Tool-calling APIs replaced regex parsing of Thought: ... Action: prose, but the underlying loop is unchanged.
To turn this mental model into a working agent with file edits, bash execution, and recovery, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer to lock in the conceptual basics.

The event loop inside a coding agent: how it thinks

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?

The event loop inside a coding agent: how it thinks

Share this post

Share this post

Continue Reading

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Retry patterns for LLM API errors in production

Weekly Bytes of AI

Ready to go deeper?