Prompt engineering: how to talk to an LLM

In the last post, we learned that LLMs are powerful pattern-matching engines. But their output is highly sensitive to their input.

Vague instructions lead to vague results. Specific instructions lead to specific results.

Here's the key insight: A prompt is not just a question. It's a program. You are programming the LLM with your words.

Prompt Engineering is the art and science of designing inputs that guide the model to produce the most accurate, relevant, and useful outputs.

Technique 1: be specific

The single most important rule. Vague prompts get vague, unpredictable results. Specific prompts get specific, useful results.

graph TD
    A["Vague Prompt: 'Write about dogs.'"] --> B(LLM Guesses) --> C[Random Output]
    D["Specific Prompt: 'Write a 3-sentence paragraph for a 5th grader about golden retrievers...'"] --> E(LLM Follows Plan) --> F[Targeted Output]

👎 Bad prompt: vague

This gives the LLM too much freedom, so it just guesses.

# filename: example.py
# description: Code example from the post.
# The 'messages' list is what we send to the LLM.
# This prompt is too vague.

bad_prompt = [
    {"role": "user", "content": "Write about dogs."}
]

👍 Good prompt: specific

This prompt specifies the audience (5th-grade student), format (3-4 sentences), topic (golden retrievers), and key points (temperament, trainability).

good_prompt = [
    {"role": "user", "content": """
    Write a short paragraph (3-4 sentences) for a 5th-grade student 
    explaining why golden retrievers make good family pets. 
    Focus on their temperament and trainability.
    """}
]

Technique 2: provide examples (few-shot prompting)

If you need the output in a specific, repeatable format, showing is better than telling. By providing a few input/output examples, you teach the model the exact pattern you want.

In the messages list, we provide "examples" by pretending the assistant has already answered previous questions perfectly.

# We are teaching the LLM a pattern:
# User gives a review, Assistant replies with "Sentiment: ..., Topics: ..."

few_shot_prompt = [
    # Example 1
    {"role": "user", "content": "Review: 'The food was cold and the service was slow.'"},
    {"role": "assistant", "content": "Sentiment: Negative, Topics: food quality, service speed"},
    
    # Example 2
    {"role": "user", "content": "Review: 'A masterpiece of cinema. The acting was superb.'"},
    {"role": "assistant", "content": "Sentiment: Positive, Topics: film quality, acting"},
    
    # Our REAL question
    {"role": "user", "content": "Review: 'I loved the new action movie! The plot was a bit weak, but the special effects were incredible.'"}
]

# The LLM will now follow the pattern and reply:
# "Sentiment: Positive, Topics: plot, special effects"

Technique 3: assign a role (role prompting)

Telling the model who it should be is a powerful way to shape its tone, style, and knowledge base. We use the system message to set this "persona" for the entire conversation.

# The 'system' message gives the LLM its instructions for the whole chat.
# Here, we tell it to be an expert.

physicist_prompt = [
    {"role": "system", "content": "You are a leading astrophysicist known for making complex topics accessible. Your tone is authoritative but clear."},
    {"role": "user", "content": "Briefly, what is a black hole?"}
]

# The output will be technical but easy to understand.
# If we changed the system role to "You are a pirate," the answer
# about a black hole would be completely different!

Why let the model think?

For problems that require logic or multiple steps, LLMs can make silly mistakes. Forcing them to "think out loud" before giving the final answer dramatically improves their reasoning ability.

The magic phrase is often as simple as "Let's think step by step."

👎 Standard prompt: might fail

standard_prompt = [
    {"role": "user", "content": "A canteen has 20 apples. They use 5 to make a pie and then buy 12 more. How many apples do they have now?"}
]

# The LLM might just guess "27".

👍 Chain-of-thought prompt: more reliable

cot_prompt = [
    {"role": "user", "content": "A canteen has 20 apples. They use 5 to make a pie and then buy 12 more. How many apples do they have now? Let's think step by step."}
]

# The LLM will now output its reasoning first:
# "1. Start with 20 apples.
#  2. Use 5, so 20 - 5 = 15 apples.
#  3. Buy 12 more, so 15 + 12 = 27 apples.
#  The final answer is 27."

This forces the LLM to follow a logical path, making its answer more accurate.

Why get a second opinion?

You can improve the quality of an LLM's output by asking it to follow a multi-step process:

Generate an initial draft
Critique its own work based on criteria you provide
Generate a final, improved version

This forces a more careful process instead of a quick, single-pass answer.

self_critique_prompt = [
    {"role": "user", "content": """
    I need a slogan for a new eco-friendly water bottle.

    Please follow these steps:
    1. Generate one initial slogan.
    2. Critically evaluate your slogan: Is it short? Is it memorable? Does it clearly communicate 'eco-friendly'?
    3. Based on your critique, provide one final, improved slogan.
    """}
]

How do you get structured output?

For applications, you often need data in a predictable format like JSON, not just plain text. Modern LLMs can be instructed to only output valid JSON.

This is a big shift for building reliable AI-powered features. We do this by setting a special parameter in the request, often called response_format.

# This is what the full 'create' call looks like,
# showing the special 'response_format' parameter.

text_to_process = "John Doe is a 32-year-old software engineer from New York."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are an expert data extraction assistant. Only output valid JSON."},
        {"role": "user", "content": f"Extract the name, age, and city from: {text_to_process}"}
    ],
    
    # This magic line forces the LLM to reply in JSON
    response_format={"type": "json_object"}
)

# The 'response.content' will be a clean JSON string,
# not conversational text:
# {
#   "name": "John Doe",
#   "age": 32,
#   "city": "New York"
# }

Your prompts are probably too vague. The post shows that specificity is the difference between guessing and precision. Add three things: exact requirements (audience, format, scope), examples of the output pattern (few-shot), and a role assignment. These teach the LLM exactly what to do. When combined with chain-of-thought (asking it to think step-by-step), consistency improves dramatically. Vague instructions = vague results.

When should I use chain-of-thought in production?

It depends on your requirements. For APIs needing sub-second responses, inline reasoning might be too costly. But for problems needing logic or multiple steps (analysis, coding, reviews), the quality jump usually justifies the latency. The post explains the trade-off: if answers are wrong without it, the cost is worth it. Measure against your specific use case, not a generic threshold.

How do I guarantee valid JSON from an LLM every time?

Use JSON mode, a parameter that constrains the model to output only valid JSON. This is a hard guarantee, not luck. It shifts from "hope the formatting is right" to "parsing will never fail." This is critical for production: it eliminates JSON syntax errors entirely. This is why JSON mode matters for reliable AI features.

For the full reference, see the Anthropic prompt engineering guide.

Key takeaways

Clarity is king: The more specific your instructions, the better your results
Show, don't just tell: Few-shot examples are the best way to control output format
Context is everything: Assigning a system role is a powerful way to set the LLM's tone and persona
Force the model to reason: Asking for "step-by-step" thinking improves accuracy on complex tasks
Refinement improves quality: Use self-critique to make the model "check its own work"
Structure is your friend: Use JSON mode to get data you can reliably use in your applications

For more advanced prompt engineering techniques, check out our series on transforming generic bots to expert agents, structured output, chain-of-thought reasoning, and building agents with tools.

For more on building production AI systems, check out our AI Engineering Bootcamp.

Take the next step

Prompt Engineering Crash Course, Master specificity, few-shot, chain-of-thought, and more

In the last post, we learned that LLMs are powerful pattern-matching engines. But their output is highly sensitive to their input.

Vague instructions lead to vague results. Specific instructions lead to specific results.

Here's the key insight: A prompt is not just a question. It's a program. You are programming the LLM with your words.

Prompt Engineering is the art and science of designing inputs that guide the model to produce the most accurate, relevant, and useful outputs.

Technique 1: be specific

The single most important rule. Vague prompts get vague, unpredictable results. Specific prompts get specific, useful results.

graph TD
    A["Vague Prompt: 'Write about dogs.'"] --> B(LLM Guesses) --> C[Random Output]
    D["Specific Prompt: 'Write a 3-sentence paragraph for a 5th grader about golden retrievers...'"] --> E(LLM Follows Plan) --> F[Targeted Output]

👎 Bad prompt: vague

This gives the LLM too much freedom, so it just guesses.

# filename: example.py
# description: Code example from the post.
# The 'messages' list is what we send to the LLM.
# This prompt is too vague.

bad_prompt = [
    {"role": "user", "content": "Write about dogs."}
]

👍 Good prompt: specific

This prompt specifies the audience (5th-grade student), format (3-4 sentences), topic (golden retrievers), and key points (temperament, trainability).

good_prompt = [
    {"role": "user", "content": """
    Write a short paragraph (3-4 sentences) for a 5th-grade student 
    explaining why golden retrievers make good family pets. 
    Focus on their temperament and trainability.
    """}
]

Technique 2: provide examples (few-shot prompting)

If you need the output in a specific, repeatable format, showing is better than telling. By providing a few input/output examples, you teach the model the exact pattern you want.

In the messages list, we provide "examples" by pretending the assistant has already answered previous questions perfectly.

# We are teaching the LLM a pattern:
# User gives a review, Assistant replies with "Sentiment: ..., Topics: ..."

few_shot_prompt = [
    # Example 1
    {"role": "user", "content": "Review: 'The food was cold and the service was slow.'"},
    {"role": "assistant", "content": "Sentiment: Negative, Topics: food quality, service speed"},
    
    # Example 2
    {"role": "user", "content": "Review: 'A masterpiece of cinema. The acting was superb.'"},
    {"role": "assistant", "content": "Sentiment: Positive, Topics: film quality, acting"},
    
    # Our REAL question
    {"role": "user", "content": "Review: 'I loved the new action movie! The plot was a bit weak, but the special effects were incredible.'"}
]

# The LLM will now follow the pattern and reply:
# "Sentiment: Positive, Topics: plot, special effects"

Technique 3: assign a role (role prompting)

Telling the model who it should be is a powerful way to shape its tone, style, and knowledge base. We use the system message to set this "persona" for the entire conversation.

# The 'system' message gives the LLM its instructions for the whole chat.
# Here, we tell it to be an expert.

physicist_prompt = [
    {"role": "system", "content": "You are a leading astrophysicist known for making complex topics accessible. Your tone is authoritative but clear."},
    {"role": "user", "content": "Briefly, what is a black hole?"}
]

# The output will be technical but easy to understand.
# If we changed the system role to "You are a pirate," the answer
# about a black hole would be completely different!

Why let the model think?

For problems that require logic or multiple steps, LLMs can make silly mistakes. Forcing them to "think out loud" before giving the final answer dramatically improves their reasoning ability.

The magic phrase is often as simple as "Let's think step by step."

👎 Standard prompt: might fail

standard_prompt = [
    {"role": "user", "content": "A canteen has 20 apples. They use 5 to make a pie and then buy 12 more. How many apples do they have now?"}
]

# The LLM might just guess "27".

👍 Chain-of-thought prompt: more reliable

cot_prompt = [
    {"role": "user", "content": "A canteen has 20 apples. They use 5 to make a pie and then buy 12 more. How many apples do they have now? Let's think step by step."}
]

# The LLM will now output its reasoning first:
# "1. Start with 20 apples.
#  2. Use 5, so 20 - 5 = 15 apples.
#  3. Buy 12 more, so 15 + 12 = 27 apples.
#  The final answer is 27."

This forces the LLM to follow a logical path, making its answer more accurate.

Why get a second opinion?

You can improve the quality of an LLM's output by asking it to follow a multi-step process:

Generate an initial draft
Critique its own work based on criteria you provide
Generate a final, improved version

This forces a more careful process instead of a quick, single-pass answer.

self_critique_prompt = [
    {"role": "user", "content": """
    I need a slogan for a new eco-friendly water bottle.

    Please follow these steps:
    1. Generate one initial slogan.
    2. Critically evaluate your slogan: Is it short? Is it memorable? Does it clearly communicate 'eco-friendly'?
    3. Based on your critique, provide one final, improved slogan.
    """}
]

How do you get structured output?

For applications, you often need data in a predictable format like JSON, not just plain text. Modern LLMs can be instructed to only output valid JSON.

This is a big shift for building reliable AI-powered features. We do this by setting a special parameter in the request, often called response_format.

# This is what the full 'create' call looks like,
# showing the special 'response_format' parameter.

text_to_process = "John Doe is a 32-year-old software engineer from New York."

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are an expert data extraction assistant. Only output valid JSON."},
        {"role": "user", "content": f"Extract the name, age, and city from: {text_to_process}"}
    ],
    
    # This magic line forces the LLM to reply in JSON
    response_format={"type": "json_object"}
)

# The 'response.content' will be a clean JSON string,
# not conversational text:
# {
#   "name": "John Doe",
#   "age": 32,
#   "city": "New York"
# }

Clarity is king: The more specific your instructions, the better your results
Show, don't just tell: Few-shot examples are the best way to control output format
Context is everything: Assigning a system role is a powerful way to set the LLM's tone and persona
Force the model to reason: Asking for "step-by-step" thinking improves accuracy on complex tasks
Refinement improves quality: Use self-critique to make the model "check its own work"
Structure is your friend: Use JSON mode to get data you can reliably use in your applications

For more advanced prompt engineering techniques, check out our series on transforming generic bots to expert agents, structured output, chain-of-thought reasoning, and building agents with tools.

For more on building production AI systems, check out our AI Engineering Bootcamp.

Take the next step

Prompt Engineering Crash Course, Master specificity, few-shot, chain-of-thought, and more

Prompt engineering: how to talk to an LLM

Share this post

Share this post

Continue Reading

Choosing the LLM judge for evaluation pipelines

Hallucination testing for RAG pipelines

Fact-checking RAG answers: grounding with verification

Weekly Bytes of AI

Ready to go deeper?

Prompt engineering: how to talk to an LLM

Share this post

Share this post

Continue Reading

Choosing the LLM judge for evaluation pipelines

Hallucination testing for RAG pipelines

Fact-checking RAG answers: grounding with verification

Weekly Bytes of AI

Ready to go deeper?