Passive scanner architectureΒΆ
Documenting the architecture for the π Passive scanner spike @GitHub, defining components, data flow, and responsibilities. This ensures the spike is structured for future scaling while keeping the Desk β Forge β Scanner workflow intact.
PrinciplesΒΆ
Fingerprint logic is immutable design input:
fingerprint.yamldefines probes and match logic. This layer never changes during provider swaps.Providers are replaceable services: External datasets (Netlas, Censys, etc.) are accessed via a defined interface.
Evaluator is deterministic: Match logic is applied to observations without side effects.
Controller orchestrates, not decides:
scan.pysequences the flow but does not implement probe logic.
Directory structureΒΆ
passive-scanner/
βββ scan.py # CLI & orchestration
βββ fingerprint/
β βββ loader.py # fingerprint.yaml β internal model
β βββ model.py # Probe, MatchLogic, Fingerprint classes
βββ providers/
β βββ base.py # Abstract search provider interface
β βββ netlas.py # Netlas implementation
β βββ censys.py # Optional Censys implementation
βββ engine/
β βββ planner.py # Probe β provider query translation
β βββ evaluator.py # Apply match_logic
β βββ evidence.py # Normalize and store evidence
βββ io/
β βββ targets.py # Load IPs / netblocks
β βββ output.py # JSONL results writer
βββ FINDINGS.md # Spike results & limitations
Component responsibilitiesΒΆ
Fingerprint layerΒΆ
loader.py: Parse
fingerprint.yamlinto internal data structures.model.py: Define
Probe,Fingerprint,MatchResultobjects.Responsibilities: Maintain tool-agnostic representation of probes and match logic.
ProvidersΒΆ
base.py: Abstract class defining
search_probe(probe, targets).netlas.py: Implements Netlas-specific queries.
censys.py: Optional fallback provider.
Responsibilities: Return observations per probe without evaluating match logic.
PlannerΒΆ
planner.py: Maps probes to provider query syntax.
Example:
http.headers.server:"DeviceOS/2.1.4"Responsibilities: Keep API-specific logic isolated.
Evaluation engineΒΆ
evaluator.py: Apply
match_logicto grouped observations.evidence.py: Structure evidence per IP for JSONL output.
Responsibilities: Ensure deterministic, testable logic evaluation.
CLI/ControllerΒΆ
scan.py: Orchestrates loading fingerprints, reading targets, calling planner/provider/evaluator, emitting results.
Responsibilities: Glue only, no logic decisions.
IOΒΆ
Currently not used in the spike. Is for later.
targets.py: Handle IP/netblock input.
output.py: Emit JSONL lines:
<timestamp>, <ip>, <match_result>, <evidence_snippet>.
Data flowΒΆ
scan.py loads
fingerprint.yamlandtargets.txt.planner.py converts each probe into provider-specific queries.
provider executes queries and returns raw observations.
evaluator.py groups observations per IP, applies
match_logic, outputsMatchResult.output.py writes JSONL with matched IPs and evidence.
Note: No packets are sent. This is passive, internet-facing scanning.
Spike success criteriaΒΆ
CLI tool runs and consumes one
fingerprint.yaml.Queries Netlas for 100 test IPs, outputs structured results.
Findings documented with API limits, data quality, and gaps.
End-to-end flow verified; no assumptions about full-scale or active scanning.
Scaling considerationsΒΆ
Add providers: Netlas β Censys β local datasets???
Caching & pagination: Wrap providers.
Parallelization: Later enhancement; controller remains unchanged.
Bulk fingerprints: Evaluator unchanged, planner handles translation.
Active probes: Separate provider module, no effect on fingerprint model.
Non-goals (current spike)ΒΆ
GUI
Concurrent fingerprint evaluation
Active probing
Optimisation for speed
Full error handling
This document defines the backbone for the passive-scanner spike. It ensures the Desk β Forge β Scanner pipeline is testable, modular, and ready for incremental enhancement.