Passive scanner architectureΒΆ

Documenting the architecture for the πŸ™ Passive scanner spike @GitHub, defining components, data flow, and responsibilities. This ensures the spike is structured for future scaling while keeping the Desk β†’ Forge β†’ Scanner workflow intact.

PrinciplesΒΆ

  • Fingerprint logic is immutable design input: fingerprint.yaml defines probes and match logic. This layer never changes during provider swaps.

  • Providers are replaceable services: External datasets (Netlas, Censys, etc.) are accessed via a defined interface.

  • Evaluator is deterministic: Match logic is applied to observations without side effects.

  • Controller orchestrates, not decides: scan.py sequences the flow but does not implement probe logic.

Directory structureΒΆ

passive-scanner/
β”œβ”€β”€ scan.py                  # CLI & orchestration
β”œβ”€β”€ fingerprint/
β”‚   β”œβ”€β”€ loader.py            # fingerprint.yaml β†’ internal model
β”‚   └── model.py             # Probe, MatchLogic, Fingerprint classes
β”œβ”€β”€ providers/
β”‚   β”œβ”€β”€ base.py              # Abstract search provider interface
β”‚   β”œβ”€β”€ netlas.py            # Netlas implementation
β”‚   └── censys.py            # Optional Censys implementation
β”œβ”€β”€ engine/
β”‚   β”œβ”€β”€ planner.py           # Probe β†’ provider query translation
β”‚   β”œβ”€β”€ evaluator.py         # Apply match_logic
β”‚   └── evidence.py          # Normalize and store evidence
β”œβ”€β”€ io/
β”‚   β”œβ”€β”€ targets.py           # Load IPs / netblocks
β”‚   └── output.py            # JSONL results writer
└── FINDINGS.md              # Spike results & limitations

Component responsibilitiesΒΆ

Fingerprint layerΒΆ

  • loader.py: Parse fingerprint.yaml into internal data structures.

  • model.py: Define Probe, Fingerprint, MatchResult objects.

  • Responsibilities: Maintain tool-agnostic representation of probes and match logic.

ProvidersΒΆ

  • base.py: Abstract class defining search_probe(probe, targets).

  • netlas.py: Implements Netlas-specific queries.

  • censys.py: Optional fallback provider.

  • Responsibilities: Return observations per probe without evaluating match logic.

PlannerΒΆ

  • planner.py: Maps probes to provider query syntax.

  • Example: http.headers.server:"DeviceOS/2.1.4"

  • Responsibilities: Keep API-specific logic isolated.

Evaluation engineΒΆ

  • evaluator.py: Apply match_logic to grouped observations.

  • evidence.py: Structure evidence per IP for JSONL output.

  • Responsibilities: Ensure deterministic, testable logic evaluation.

CLI/ControllerΒΆ

  • scan.py: Orchestrates loading fingerprints, reading targets, calling planner/provider/evaluator, emitting results.

  • Responsibilities: Glue only, no logic decisions.

IOΒΆ

Currently not used in the spike. Is for later.

  • targets.py: Handle IP/netblock input.

  • output.py: Emit JSONL lines: <timestamp>, <ip>, <match_result>, <evidence_snippet>.

Data flowΒΆ

  1. scan.py loads fingerprint.yaml and targets.txt.

  2. planner.py converts each probe into provider-specific queries.

  3. provider executes queries and returns raw observations.

  4. evaluator.py groups observations per IP, applies match_logic, outputs MatchResult.

  5. output.py writes JSONL with matched IPs and evidence.

Note: No packets are sent. This is passive, internet-facing scanning.

Spike success criteriaΒΆ

  • CLI tool runs and consumes one fingerprint.yaml.

  • Queries Netlas for 100 test IPs, outputs structured results.

  • Findings documented with API limits, data quality, and gaps.

  • End-to-end flow verified; no assumptions about full-scale or active scanning.

Scaling considerationsΒΆ

  • Add providers: Netlas β†’ Censys β†’ local datasets???

  • Caching & pagination: Wrap providers.

  • Parallelization: Later enhancement; controller remains unchanged.

  • Bulk fingerprints: Evaluator unchanged, planner handles translation.

  • Active probes: Separate provider module, no effect on fingerprint model.

Non-goals (current spike)ΒΆ

  • GUI

  • Concurrent fingerprint evaluation

  • Active probing

  • Optimisation for speed

  • Full error handling

This document defines the backbone for the passive-scanner spike. It ensures the Desk β†’ Forge β†’ Scanner pipeline is testable, modular, and ready for incremental enhancement.