Path traversal¶
Path traversal occurs when user-controlled input reaches a file operation without being
constrained to an intended directory. The classic form is ../../etc/passwd, but the
mechanism appears in any context where a filename or path segment comes from outside the
application: file download endpoints, archive extraction, template loading, log viewing.
The directory traversal attack page covers common exploitation patterns. The common mistake on the defence side is checking the input string rather than the resolved path. An input like
images/../../etc/shadow may pass a check for ../ sequences in the raw string while still
resolving outside the intended directory after the OS processes it.
Python¶
The safe pattern uses pathlib.Path.resolve() to canonicalise the path, then checks whether
the result falls within the allowed base:
from pathlib import Path
BASE_DIR = Path("/var/www/uploads").resolve()
def safe_open(filename: str) -> bytes:
# resolve() follows symlinks and collapses .. sequences
target = (BASE_DIR / filename).resolve()
if not target.is_relative_to(BASE_DIR):
raise ValueError("access outside upload directory")
return target.read_bytes()
The check happens on the resolved path, not the raw string. is_relative_to() is available
from Python 3.9; on earlier versions, use:
if not str(target).startswith(str(BASE_DIR) + "/"):
raise ValueError("access outside upload directory")
The unsafe pattern, for comparison:
# unsafe: checks the input string, not the resolved path
def unsafe_open(filename: str) -> bytes:
if ".." in filename:
raise ValueError("invalid path")
return open(f"/var/www/uploads/{filename}").read()
A filename like images/%2e%2e%2fetc%2fpasswd (URL-encoded) may bypass the string check.
Node.js¶
const path = require("path");
const fs = require("fs");
const BASE_DIR = path.resolve("/var/www/uploads");
function safeRead(filename) {
const target = path.resolve(BASE_DIR, filename);
if (!target.startsWith(BASE_DIR + path.sep)) {
throw new Error("access outside upload directory");
}
return fs.readFileSync(target);
}
The path.sep suffix prevents a directory named /var/www/uploads-other from passing the
prefix check.
Archive extraction¶
Archive extraction is a high-risk context. ZIP and tar archives can contain entries with
absolute paths or .. sequences. Extraction libraries do not always strip these.
import zipfile
from pathlib import Path
def safe_extract(zip_path: str, dest: str) -> None:
dest_dir = Path(dest).resolve()
with zipfile.ZipFile(zip_path) as zf:
for member in zf.namelist():
target = (dest_dir / member).resolve()
if not target.is_relative_to(dest_dir):
raise ValueError(f"zip slip: {member}")
zf.extract(member, dest_dir)
This is the “zip slip” pattern: a maliciously crafted archive overwrites arbitrary files on the filesystem if extraction is done without path validation.