ripgrep for coding agents: fast code search tool

Your agent's grep is reading binary files and returning node_modules

Your coding agent needs to find usages of a function across the codebase. Your grep tool runs subprocess.run(['grep', '-r', pattern, '.']). 45 seconds later, it returns 3000 results, 2700 of them from node_modules, 50 from .git, and another 100 from minified JS. The actual 15 results you cared about are buried. The model reads all 3000, gets confused, and picks the wrong file.

This is standard grep -r behavior. It reads every file under the cwd, ignores no patterns, and does not know what a source file is. It has not been the right tool for code search in a decade.

ripgrep (rg) is the tool every coding agent should ship with. It respects .gitignore by default, skips binary files, recursively searches at roughly 10x the speed of grep, and returns structured output you can parse safely. This post is the rg flags that matter for agents, the safe wrapper pattern, and the 2 gotchas that bite when you first drop it in.

Why is ripgrep the right search tool for coding agents?

Because it was built for code search, not for general pattern matching. 4 properties that matter:

Respects .gitignore, .ignore, and .rgignore by default. node_modules, __pycache__, .git, build artifacts, compiled binaries - all skipped. You never again send 10MB of minified JS to an LLM.
Skips binary files automatically. grep will happily search a 50MB PNG. rg skips it without reading. This alone saves minutes on a real repo.
Faster than grep by roughly an order of magnitude. On a 2GB monorepo, a search that takes 15 seconds with grep takes under 2 with rg. Inside an agent loop that runs 20 searches in a session, the difference is minutes per task.
Structured output. rg --json emits one JSON object per match, which you can parse directly without regex-matching file:line:text strings.

graph LR
    Agent[Agent asks for matches] --> Tool[rg tool wrapper]
    Tool -->|rg --json --max-count| RG[ripgrep process]
    RG -->|respects gitignore| Files[Source files only]
    Files --> Matches[Structured results]
    Matches -->|top 50 capped| Tool
    Tool --> Agent

    style RG fill:#dbeafe,stroke:#1e40af
    style Matches fill:#dcfce7,stroke:#15803d

The combination of gitignore awareness and structured output is what turns rg from "faster grep" into "code search designed for an LLM caller."

What ripgrep flags matter for agent tools?

6 flags that almost always belong in the wrapper. The rest are nice to have.

--json: structured output that parses reliably. Use this instead of the default file:line:text lines.
--max-count N: stop after N matches per file. Prevents one pathological file from dominating results.
--max-columns 200: truncate long lines. A model does not need the full 5000-column minified line that matched your pattern.
--type py (or other language): limit to one file type. Huge speedup on mixed repos.
--glob '!tests/**' or similar: exclude directories the model does not need to see.
-C 2: include 2 lines of context around each match. More useful for the LLM than just the matching line.

A sensible default for an agent's search tool uses all 6 flags with sane values, and exposes only the query and an optional type filter to the model.

How do you wrap ripgrep as a safe agent tool?

Run rg as a subprocess with a fixed flag set, parse the JSON output, and return a structured list of matches. Cap the total results so a bad pattern cannot blow up the context window.

# filename: rg_tool.py
# description: A safe ripgrep wrapper for a coding agent. Fixed flags,
# capped results, structured output ready to feed back to the model.
import json
import subprocess
from pathlib import Path


MAX_RESULTS = 50
FLAGS = [
    '--json',
    '--max-count', '3',
    '--max-columns', '200',
    '-C', '2',
    '--no-heading',
]


def rg_search(pattern: str, path: str = '.', file_type: str | None = None) -> dict:
    cwd = Path(path).resolve()
    cmd = ['rg', *FLAGS, pattern, str(cwd)]
    if file_type:
        cmd.extend(['--type', file_type])

    try:
        result = subprocess.run(
            cmd, capture_output=True, text=True, timeout=15,
        )
    except subprocess.TimeoutExpired:
        return {'ok': False, 'error': 'search timed out'}

    matches = []
    for line in result.stdout.splitlines():
        try:
            event = json.loads(line)
        except json.JSONDecodeError:
            continue
        if event.get('type') == 'match':
            data = event['data']
            matches.append({
                'path': data['path']['text'],
                'line': data['line_number'],
                'text': data['lines']['text'].rstrip('\n'),
            })
            if len(matches) >= MAX_RESULTS:
                break

    return {'ok': True, 'matches': matches, 'count': len(matches)}

50 lines and it covers every case that matters. Fixed timeout. Capped results. Parsed JSON. Structured return shape. The model sees a clean list of (path, line, text) tuples and nothing else.

For the broader registry pattern that makes adding this tool clean, see Designing Modular Tool Integrations for Coding Agents. For the full coding agent build that uses this wrapper, see Build a Coding Agent with Claude: A Step-by-Step Guide.

How do you describe the tool to the model?

Short, specific, and with examples. The model will misuse rg the same way a human would if they did not know the flag set. Describe what the tool does, what the patterns support (regex), and what to do when there are too many results.

# filename: tool_spec.py
# description: The tool definition the model sees. Examples teach the
# model how to scope searches effectively.
RG_TOOL_SPEC = {
    'name': 'code_search',
    'description': (
        'Search the codebase for a regex pattern using ripgrep. '
        'Respects .gitignore. Returns up to 50 matches with 2 lines of '
        'context. Use file_type to limit to a language (e.g. "py", "ts"). '
        'Examples: pattern="def validate_token", file_type="py". '
        'If too many matches come back, rerun with a more specific pattern.'
    ),
    'input_schema': {
        'type': 'object',
        'properties': {
            'pattern': {'type': 'string', 'description': 'regex pattern to search for'},
            'file_type': {'type': 'string', 'description': 'optional language filter'},
        },
        'required': ['pattern'],
    },
}

The example in the description does more work than the flag documentation. Models copy patterns from examples far more readily than they synthesize them from flag lists. Give one concrete example per tool and the model's search accuracy climbs noticeably.

What are the 2 gotchas that bite with rg in an agent?

First, the working directory. If your agent runs in a container with a different cwd from the project root, rg searches the wrong tree. Fix by always passing an explicit path argument to the wrapper and resolving it at runtime. Never rely on os.getcwd().

Second, escaping. The model will sometimes pass unescaped regex metacharacters when it wanted a literal string. search for "foo()" becomes a regex for "foo" followed by an empty group. The fix is to expose a separate literal flag that passes -F (fixed string mode) to rg. The model can pick whether to match as regex or literal text:

# filename: rg_tool_v2.py
# description: Add a literal flag for non-regex searches.
def rg_search(pattern: str, path: str = '.', literal: bool = False) -> dict:
    flags = [*FLAGS]
    if literal:
        flags.append('-F')
    cmd = ['rg', *flags, pattern, path]
    # ... rest unchanged

Half of the searches a coding agent runs are literal string searches. Making literal mode a first-class flag cuts the regex-escape footguns in half.

When should you combine rg with glob-style path filters?

When a project has deeply nested directories that the model should not touch. Test fixtures, auto-generated code, third-party vendored libraries that are not in .gitignore. The fix is --glob '!tests/fixtures/**' or similar.

You can expose a fixed list of exclude globs in the wrapper and let the tool spec mention them. The model should not set globs directly because it will get the syntax wrong and waste calls debugging its own regex. Pre-set the excludes once and move on.

For the broader agent architecture that uses rg alongside read, edit, and bash tools, the Build Your Own Coding Agent course covers it module by module. The free AI Agents Fundamentals primer is the right starting point if the agent loop is still new.

What to do Monday morning

Install rg on your agent's runtime. If it is a container, add RUN apt-get install -y ripgrep (or the equivalent for your base image). 10 seconds to add, 10x speedup on every search.
Replace your grep tool with the wrapper from this post. Start with the 6 fixed flags. Do not let the model override them.
Add a literal flag so the model can pick fixed-string searches for function names and API calls. This cuts regex escape bugs in half.
Write a one-line example in the tool description. "Examples: pattern='def validate_token', file_type='py'". The model copies the example on its first use.
Test with a pathological query. Search for . (matches everything) and confirm your result cap kicks in at 50. If it does not, add a --max-count cap or lower the MAX_RESULTS constant.

The headline: rg is the code search tool agents should have been using since day 1. 50-line wrapper, 6 flags, immediate speedup. Drop it in today.

Frequently asked questions

Why should coding agents use ripgrep instead of grep?

Because ripgrep respects .gitignore, skips binary files, uses structured JSON output, and runs an order of magnitude faster than grep on realistic code repos. Every one of those matters inside an agent loop. A grep-based search tool wastes context on node_modules hits and wastes time scanning binaries. ripgrep is what the tool should have been from day 1.

What ripgrep flags should a coding agent's search tool use?

--json for structured output, --max-count per file to cap one file from dominating, --max-columns 200 to truncate long lines, -C 2 for context around matches, and language-specific --type filters when applicable. Fix these flags in the wrapper and expose only pattern and file_type to the model.

How do you prevent a coding agent from returning too many search results?

Cap the total results in the wrapper (50 is a reasonable default) and use --max-count 3 per file so one pathological file cannot dominate the output. When the cap fires, return a message telling the model to refine the pattern. Training the model to iterate on searches is cheaper than sending thousands of matches back to the LLM.

Should coding agents use regex or literal search?

Both, and let the model choose. Expose a literal flag that passes -F (fixed string mode) to ripgrep for exact substring matches. About half of a coding agent's searches are for function names and API calls, which are literal strings. Making literal mode easy cuts regex-escape bugs in half and produces fewer false positives.

How do I handle different working directories when running ripgrep in an agent?

Always pass an explicit path argument to the wrapper and resolve it at runtime with Path(path).resolve(). Never rely on os.getcwd() because the agent process might run from a container root while the project lives in a subdirectory. Making the path explicit also helps with multi-project agents that search across distinct roots.

Key takeaways

ripgrep is built for code search and respects .gitignore, skips binaries, and outputs structured JSON. Every one of these matters inside an agent.
Fix the flag set in the wrapper and expose only pattern and file_type to the model. Models pick flags badly; pre-pick them once.
Cap both the per-file max count and the total result count. Runaway patterns should fail closed with a "too many matches, refine" error.
Add a literal flag for fixed-string searches. Half of agent search traffic is function-name lookups and the regex mode creates bugs there.
Include a one-line example in the tool description. Models copy examples; a single pattern='def validate_token', file_type='py' teaches the usage pattern.
To see ripgrep wired into a full coding agent with read, edit, and bash tools, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

For the full ripgrep documentation, flag reference, and performance characteristics, see the ripgrep user guide. The flags in this post are the ones the guide itself recommends for tooling integrations.

Your agent's grep is reading binary files and returning node_modules

This is standard grep -r behavior. It reads every file under the cwd, ignores no patterns, and does not know what a source file is. It has not been the right tool for code search in a decade.

Why is ripgrep the right search tool for coding agents?

Because it was built for code search, not for general pattern matching. 4 properties that matter:

Respects .gitignore, .ignore, and .rgignore by default. node_modules, __pycache__, .git, build artifacts, compiled binaries - all skipped. You never again send 10MB of minified JS to an LLM.
Skips binary files automatically. grep will happily search a 50MB PNG. rg skips it without reading. This alone saves minutes on a real repo.
Faster than grep by roughly an order of magnitude. On a 2GB monorepo, a search that takes 15 seconds with grep takes under 2 with rg. Inside an agent loop that runs 20 searches in a session, the difference is minutes per task.
Structured output. rg --json emits one JSON object per match, which you can parse directly without regex-matching file:line:text strings.

graph LR
    Agent[Agent asks for matches] --> Tool[rg tool wrapper]
    Tool -->|rg --json --max-count| RG[ripgrep process]
    RG -->|respects gitignore| Files[Source files only]
    Files --> Matches[Structured results]
    Matches -->|top 50 capped| Tool
    Tool --> Agent

    style RG fill:#dbeafe,stroke:#1e40af
    style Matches fill:#dcfce7,stroke:#15803d

The combination of gitignore awareness and structured output is what turns rg from "faster grep" into "code search designed for an LLM caller."

What ripgrep flags matter for agent tools?

6 flags that almost always belong in the wrapper. The rest are nice to have.

--json: structured output that parses reliably. Use this instead of the default file:line:text lines.
--max-count N: stop after N matches per file. Prevents one pathological file from dominating results.
--max-columns 200: truncate long lines. A model does not need the full 5000-column minified line that matched your pattern.
--type py (or other language): limit to one file type. Huge speedup on mixed repos.
--glob '!tests/**' or similar: exclude directories the model does not need to see.
-C 2: include 2 lines of context around each match. More useful for the LLM than just the matching line.

A sensible default for an agent's search tool uses all 6 flags with sane values, and exposes only the query and an optional type filter to the model.

How do you wrap ripgrep as a safe agent tool?

Run rg as a subprocess with a fixed flag set, parse the JSON output, and return a structured list of matches. Cap the total results so a bad pattern cannot blow up the context window.

# filename: rg_tool.py
# description: A safe ripgrep wrapper for a coding agent. Fixed flags,
# capped results, structured output ready to feed back to the model.
import json
import subprocess
from pathlib import Path


MAX_RESULTS = 50
FLAGS = [
    '--json',
    '--max-count', '3',
    '--max-columns', '200',
    '-C', '2',
    '--no-heading',
]


def rg_search(pattern: str, path: str = '.', file_type: str | None = None) -> dict:
    cwd = Path(path).resolve()
    cmd = ['rg', *FLAGS, pattern, str(cwd)]
    if file_type:
        cmd.extend(['--type', file_type])

    try:
        result = subprocess.run(
            cmd, capture_output=True, text=True, timeout=15,
        )
    except subprocess.TimeoutExpired:
        return {'ok': False, 'error': 'search timed out'}

    matches = []
    for line in result.stdout.splitlines():
        try:
            event = json.loads(line)
        except json.JSONDecodeError:
            continue
        if event.get('type') == 'match':
            data = event['data']
            matches.append({
                'path': data['path']['text'],
                'line': data['line_number'],
                'text': data['lines']['text'].rstrip('\n'),
            })
            if len(matches) >= MAX_RESULTS:
                break

    return {'ok': True, 'matches': matches, 'count': len(matches)}

50 lines and it covers every case that matters. Fixed timeout. Capped results. Parsed JSON. Structured return shape. The model sees a clean list of (path, line, text) tuples and nothing else.

How do you describe the tool to the model?

# filename: tool_spec.py
# description: The tool definition the model sees. Examples teach the
# model how to scope searches effectively.
RG_TOOL_SPEC = {
    'name': 'code_search',
    'description': (
        'Search the codebase for a regex pattern using ripgrep. '
        'Respects .gitignore. Returns up to 50 matches with 2 lines of '
        'context. Use file_type to limit to a language (e.g. "py", "ts"). '
        'Examples: pattern="def validate_token", file_type="py". '
        'If too many matches come back, rerun with a more specific pattern.'
    ),
    'input_schema': {
        'type': 'object',
        'properties': {
            'pattern': {'type': 'string', 'description': 'regex pattern to search for'},
            'file_type': {'type': 'string', 'description': 'optional language filter'},
        },
        'required': ['pattern'],
    },
}

What are the 2 gotchas that bite with rg in an agent?

# filename: rg_tool_v2.py
# description: Add a literal flag for non-regex searches.
def rg_search(pattern: str, path: str = '.', literal: bool = False) -> dict:
    flags = [*FLAGS]
    if literal:
        flags.append('-F')
    cmd = ['rg', *flags, pattern, path]
    # ... rest unchanged

Half of the searches a coding agent runs are literal string searches. Making literal mode a first-class flag cuts the regex-escape footguns in half.

When should you combine rg with glob-style path filters?

What to do Monday morning

Install rg on your agent's runtime. If it is a container, add RUN apt-get install -y ripgrep (or the equivalent for your base image). 10 seconds to add, 10x speedup on every search.
Replace your grep tool with the wrapper from this post. Start with the 6 fixed flags. Do not let the model override them.
Add a literal flag so the model can pick fixed-string searches for function names and API calls. This cuts regex escape bugs in half.
Write a one-line example in the tool description. "Examples: pattern='def validate_token', file_type='py'". The model copies the example on its first use.
Test with a pathological query. Search for . (matches everything) and confirm your result cap kicks in at 50. If it does not, add a --max-count cap or lower the MAX_RESULTS constant.

The headline: rg is the code search tool agents should have been using since day 1. 50-line wrapper, 6 flags, immediate speedup. Drop it in today.

ripgrep is built for code search and respects .gitignore, skips binaries, and outputs structured JSON. Every one of these matters inside an agent.
Fix the flag set in the wrapper and expose only pattern and file_type to the model. Models pick flags badly; pre-pick them once.
Cap both the per-file max count and the total result count. Runaway patterns should fail closed with a "too many matches, refine" error.
Add a literal flag for fixed-string searches. Half of agent search traffic is function-name lookups and the regex mode creates bugs there.
Include a one-line example in the tool description. Models copy examples; a single pattern='def validate_token', file_type='py' teaches the usage pattern.
To see ripgrep wired into a full coding agent with read, edit, and bash tools, walk through the Build Your Own Coding Agent course, or start with the AI Agents Fundamentals primer.

ripgrep for coding agents: fast code search at scale

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?

ripgrep for coding agents: fast code search at scale

Share this post

Share this post

Continue Reading

Which language should you build Redis in? Lessons from rebuilding it 6 times

Query anonymization for RAG bias mitigation

pip vs uv vs poetry for Python AI services

Weekly Bytes of AI

Ready to go deeper?