Production env variable parsing in Python AI services
Your env vars are strings and your bugs prove it
You set MAX_RETRIES=3 in your .env file. Your Python code reads os.environ.get('MAX_RETRIES') and passes it to a retry decorator that expects an int. The retry never fires because '3' != 3 under strict comparison, or worse, it fires N times where N is the character length. You find the bug 2 weeks later when your bill shows retries that never happened.
This is the default trap of os.environ.get. Every environment variable comes out as a string, every type coercion is manual, every default is inline, and every typo in a variable name silently returns None. The fix is not bigger type hints. The fix is a validation layer that parses, coerces, and fails loudly at startup if anything is missing or wrong.
This post is the env variable parsing pattern I ship in every Python AI service: Pydantic Settings with strict types, required fields that crash on missing, and the single config object that the rest of the code reads from.
Why is os.environ.get dangerous for AI services?
Because AI services have many env vars with tight type requirements (floats for temperature, ints for max tokens, bools for debug flags) and every manual coercion is a chance to be wrong. 4 specific failure modes:
-
Silent string-vs-number bugs.
os.environ.get('TEMPERATURE', 0.7)returns'0.7'if the env var is set, and0.7if it isn't. The type changes based on whether a variable is present, which is a type-checker's nightmare. -
Silent missing-key defaults.
os.environ.get('OPENAI_API_KEY')returnsNoneif the key is not set. Your code then calls the LLM with no auth header and gets a 401. The real bug is 3 stack frames away from where the env var was read. -
Typos that fail at runtime.
os.environ.get('DATABSE_URL')returnsNone. The service starts fine. The first database call fails in production. -
No single source of truth. Env vars get read across the codebase, each with its own default and coercion. A prod change to one default means grepping every use.
Pydantic Settings fixes all 4 by parsing once at startup, validating types, crashing loudly on missing required fields, and exposing a single typed object every module imports from.
graph TD
Env[.env file and real env vars] --> Parser[Pydantic Settings]
Parser -->|validated types| Settings[Settings object]
Settings --> Agent[Agent loop]
Settings --> DB[Database client]
Settings --> LLM[LLM client]
Settings --> Routes[FastAPI routes]
Parser -->|missing required| Crash[SystemExit at startup]
style Parser fill:#dbeafe,stroke:#1e40af
style Crash fill:#fee2e2,stroke:#b91c1c
style Settings fill:#dcfce7,stroke:#15803d
The key property: invalid config is a startup crash, not a runtime bug. You can never ship a service that silently ran with the wrong type.
What does Pydantic Settings actually do?
Pydantic Settings (the v2 library, pydantic-settings) is a small extension of Pydantic that reads fields from environment variables instead of dict input. You declare a BaseSettings subclass with typed fields, and at instantiation it pulls each field from the matching env var, coerces the type, runs validators, and raises if anything is wrong.
# filename: app/config.py
# description: Typed settings loaded from environment variables at startup.
# Single source of truth for all runtime configuration.
from functools import lru_cache
from pydantic import Field, SecretStr
from pydantic_settings import BaseSettings, SettingsConfigDict
class Settings(BaseSettings):
model_config = SettingsConfigDict(
env_file=('.env.development', '.env'),
env_file_encoding='utf-8',
case_sensitive=False,
extra='forbid',
)
# Required: these crash on startup if missing
openai_api_key: SecretStr = Field(..., alias='OPENAI_API_KEY')
database_url: str = Field(..., alias='DATABASE_URL')
# Optional with typed defaults
app_env: str = Field(default='development', alias='APP_ENV')
debug: bool = Field(default=False)
max_retries: int = Field(default=3, ge=0, le=10)
request_timeout_s: float = Field(default=30.0, gt=0)
allowed_origins: list[str] = Field(default_factory=lambda: ['http://localhost:3000'])
@lru_cache(maxsize=1)
def get_settings() -> Settings:
return Settings()
5 things in here are doing real work. Field(...) with no default makes the field required: instantiation raises at startup if it is missing. SecretStr prevents accidentally logging the key; the value is only accessible via .get_secret_value(). ge=0, le=10 validates the range at parse time. extra='forbid' rejects unknown env vars, catching typos in .env files. @lru_cache ensures get_settings() is called once per process.
For the file-layer patterns that this loader pairs with, see the .env.development vs .env config post.
How do you handle complex types like lists and JSON?
Pydantic Settings supports list coercion from comma-separated strings and JSON coercion from JSON strings. For complex nested config, use a secondary Pydantic model:
# filename: app/config_complex.py
# description: Nested config for LLM and database. Pydantic parses the JSON env var.
import json
from pydantic import BaseModel, Field
from pydantic_settings import BaseSettings
class LLMConfig(BaseModel):
model: str = 'claude-sonnet-4-6'
temperature: float = 0.7
max_tokens: int = 2048
fallback_models: list[str] = Field(default_factory=list)
class Settings(BaseSettings):
llm: LLMConfig = Field(default_factory=LLMConfig)
# In .env: LLM='{"model":"claude-haiku-4-5","temperature":0.3}'
Set LLM as a JSON string in the env and Pydantic parses it into the nested model. This keeps complex config declarative without shell-escape pain.
How do you test code that depends on Settings?
Inject a test Settings instance. Because get_settings() is cached, use a fixture that instantiates Settings with explicit values:
# filename: tests/conftest.py
# description: Test fixture providing a fake Settings instance.
import pytest
from app.config import Settings, get_settings
@pytest.fixture
def settings():
s = Settings(
openai_api_key='test-key',
database_url='sqlite:///:memory:',
app_env='test',
debug=True,
)
get_settings.cache_clear()
yield s
get_settings.cache_clear()
The cache_clear calls bracket the test so neighbor tests don't see the cached test Settings. With this in place, every test is hermetic and repeatable.
For the broader FastAPI lifespan that reads Settings at startup, see the FastAPI and Uvicorn for production agentic AI systems post.
What are the 3 common footguns with Pydantic Settings?
-
Mutable default issues. Never use
default=[]ordefault={}as field defaults, Pydantic handles this, but raw Python does not. Always useField(default_factory=list)for mutable defaults. -
Alias case sensitivity. If
case_sensitive=False,OPENAI_API_KEYandopenai_api_keyboth match the alias. IfTrue, only exact case matches. Pick one and stick with it. -
Env file precedence. With
env_file=('.env.development', '.env'), values in the earlier file win. If you expect production.envto override, reverse the tuple order.
For the right sequencing of .env files across environments, the env development vs production config post walks through the precedence rules.
What to do Monday morning
- Open your biggest Python AI service. Grep for
os.environ.get. Every hit is a candidate for Pydantic Settings migration. - Create
app/config.pywith aSettings(BaseSettings)class. Declare every env var your service reads with its expected type and default (or required). - Replace every
os.environ.get('X')call withget_settings().x. The type checker will now catch mismatches at write time. - Add
extra='forbid'to the model config so typos in your.envfiles fail fast instead of silently working. - Run the service once. Delete the
OPENAI_API_KEYenv var and confirm it crashes at startup instead of the first API call. That crash is the feature.
The headline: env variable parsing is a validation layer, not a dict.get call. Pydantic Settings is 20 lines of config that eliminates 4 classes of bugs.
Frequently asked questions
Why use Pydantic Settings instead of os.environ.get?
Because os.environ.get returns untyped strings or None, and every manual coercion to int or float is a chance to be wrong. Pydantic Settings parses once at startup, validates types and ranges, rejects unknown fields, and raises immediately on missing required keys. The net result is that invalid config is a startup crash instead of a runtime bug 3 stack frames away.
How do I load different env files for development and production?
Use the env_file tuple in SettingsConfigDict and order it so more specific files override generic ones. Typical setup: env_file=('.env.development', '.env') where the development file wins locally. In production, set APP_ENV=production and let real env vars from the secret manager override everything. The loader reads real env vars first, file values second.
Can Pydantic Settings handle nested config like a dict of LLM params?
Yes. Declare a nested Pydantic BaseModel for the complex field, then set the env var to a JSON string. Pydantic parses the JSON into the nested model automatically. This works for lists, dicts, and nested objects of any depth, though JSON-in-env gets hard to read past 3 or 4 fields.
How do you keep secrets out of logs?
Use SecretStr for any secret field (API keys, DB passwords, signing keys). Pydantic's __repr__ for SecretStr prints ********** instead of the value, so accidental print(settings) or structured logging will not leak the secret. Access the real value only via .get_secret_value() at the point of use.
How do I test code that reads from Settings?
Instantiate Settings directly in a pytest fixture with test values, then get_settings.cache_clear() before and after each test. This gives you hermetic tests with explicit config per test, without mocking os.environ. For integration tests that need a real .env, point env_file at a dedicated test fixture file.
Key takeaways
os.environ.getis a footgun for AI services because every type coercion is manual and every missing key becomes a runtime bug far from the source.- Pydantic Settings parses once at startup, validates types, raises on missing required fields, and rejects unknown fields. Invalid config becomes a startup crash.
- Declare
openai_api_key: SecretStr = Field(...)as a required field. The...ellipsis makes it mandatory;SecretStrprevents logging leaks. - Use
extra='forbid'to catch typos in.envfiles. A misspelledDATABSE_URLfails fast instead of silently returningNone. - Test with a fixture that instantiates
Settingsdirectly. Clear thelru_cachebetween tests so neighbor tests do not see the cached instance. - To see this config pattern wired into a full production agent stack with auth, streaming, and observability, walk through the Build your own coding agent course, or start with the AI Agents Fundamentals primer.
For the full Pydantic Settings documentation covering file loading, CLI parsing, secrets, and advanced validators, see the Pydantic Settings docs. Every pattern in this post maps onto something documented there.
Continue Reading
Ready to go deeper?
Go beyond articles. Build production AI systems with hands-on workshops and our intensive AI Bootcamp.