Input validation¶
The boundary between untrusted and trusted data is the edge of the application: HTTP requests, file uploads, queue messages, webhook payloads, inter-service API calls. Internal functions can assume their arguments are already validated; entry points cannot.
Validation has two goals: confirm the value is structurally what the application expects,
and convert it to the type that subsequent code will use. The conversion step is itself a
form of validation: int(value) raises on non-numeric input without requiring a regex.
Python: Pydantic¶
Pydantic validates structured inputs at parse time and raises with field-level error messages on failure. It is suited to API request bodies, configuration files, and any context where a dict needs to be converted to typed fields:
from pydantic import BaseModel, Field, field_validator
from datetime import date
class BookingRequest(BaseModel):
guest_name: str = Field(min_length=1, max_length=100)
check_in: date
nights: int = Field(ge=1, le=365)
@field_validator("guest_name")
@classmethod
def no_control_characters(cls, v: str) -> str:
if any(c < " " for c in v):
raise ValueError("control characters not permitted")
return v.strip()
# raises pydantic.ValidationError if the data does not conform
booking = BookingRequest.model_validate(request.json())
The validated booking object carries typed fields: booking.check_in is a date, not a
string. Downstream code does not need to revalidate.
JavaScript/TypeScript: Zod¶
Zod provides the same pattern for JavaScript runtimes, with TypeScript inference:
import { z } from "zod";
const BookingSchema = z.object({
guestName: z.string().min(1).max(100),
checkIn: z.string().date(),
nights: z.number().int().min(1).max(365),
});
// throws ZodError if input does not match
const booking = BookingSchema.parse(req.body);
// booking is typed: booking.nights is number, booking.checkIn is string (ISO date)
For Express-style handlers where throwing is inconvenient:
const result = BookingSchema.safeParse(req.body);
if (!result.success) {
res.status(400).json({ errors: result.error.flatten() });
return;
}
const booking = result.data;
Primitive coercion¶
For single values, stdlib coercion is sufficient and raises on invalid input:
# raises ValueError on non-integer input
user_id = int(request.args["id"])
# raises ValueError on invalid date format
from datetime import date
check_in = date.fromisoformat(request.args["check_in"])
Regex validation¶
Regex is appropriate for formats with no stdlib parser: postcodes, product codes, identifiers. Allowlist patterns (permit known-good characters) are more reliable than denylist patterns (reject known-bad ones):
import re
POSTCODE_RE = re.compile(r"^[A-Z]{1,2}\d[A-Z\d]? \d[A-Z]{2}$")
def validate_postcode(value: str) -> str:
value = value.strip().upper()
if not POSTCODE_RE.match(value):
raise ValueError(f"invalid postcode: {value!r}")
return value
Denylist patterns that try to reject <, >, script, ' etc. are routinely bypassed
through encoding, case variation, and unicode equivalences. They offer a false sense of
coverage; the SQL injection attack techniques page shows how this plays out
against SQL injection specifically.