Jarrod E. Brown
← All projects

Solution Architecture Document — Network Threat Pipeline

Project: network-threat-pipeline Author: Jarrod E. Brown Status: Working Repository: github.com/jarrodebrown/network-threat-pipeline (private)


1. Overview

The Network Threat Pipeline is an automated network-threat-analysis system that captures WAN traffic from a Ubiquiti EdgeRouter, parses packet captures, runs four heuristic detection modules (beaconing, DNS tunneling, long-lived connections, port scanning), and enriches findings with multi-source threat intelligence. It produces a prioritized, human-readable threat summary with recommended actions. The analysis and enrichment stages are decoupled from the capture mechanism, so they work with any pcap source — not just the EdgeRouter.

2. Problem & Context

Home and small-network operators rarely have visibility into what their edge is actually talking to. Command-and-control beaconing, DNS tunneling, and long-lived connections to suspect hosts blend into normal traffic and go unnoticed. Commercial Network Detection & Response (NDR) platforms like Darktrace or Vectra start at five figures per year and are designed for enterprise-scale deployments — overkill for a home lab or small office. Open-source alternatives (Zeek, Suricata) require significant infrastructure and ongoing signature management. What's needed is a lightweight, scriptable pipeline that turns raw edge traffic into prioritized, intel-enriched findings without requiring a dedicated SOC or standing infrastructure.

3. Goals & Requirements

Functional

  • Remotely start, stop, and pull tcpdump captures from an EdgeRouter over SSH.
  • Parse pcap files and run detection modules for beaconing, DNS tunneling, long-lived connections, and port scanning.
  • Enrich detections with multiple named threat-intelligence sources, returning reputation, geolocation, and context.
  • Produce a structured, human-readable threat summary with severity tiers and recommended actions.
  • Support both ad-hoc analysis and scheduled weekly scans via systemd timer.

Non-functional

  • Portable: detection and enrichment stages run against any pcap, not just EdgeRouter captures.
  • Configuration-driven threat-intel sources — add or remove providers without code changes.
  • Low-footprint: runnable on commodity hardware (Raspberry Pi 4 or equivalent) alongside the EdgeRouter.
  • Process a 500 MB pcap (approximately 1 hour of WAN traffic at 1 Mbps average) in under 10 minutes.

4. Decision Rationale

Why custom Python over Zeek/Suricata? Zeek and Suricata are powerful but heavyweight — they require ongoing signature management, produce high-volume logs that need a SIEM to be actionable, and their installation footprint exceeds what a Raspberry Pi can comfortably handle. The pipeline's detection modules are deliberately narrow: four specific behavioral heuristics (beaconing, DNS tunneling, long-lived connections, port scanning) that target the highest-signal threats for a home network. Writing these as focused Python modules keeps the codebase auditable, the resource footprint small, and the output directly actionable without a downstream log-analysis stack.

Why tcpdump over SPAN/TAP? The EdgeRouter's built-in tcpdump provides packet capture without additional hardware. A SPAN port or network TAP would provide cleaner full-duplex capture but requires physical infrastructure changes. For a home-network use case where the EdgeRouter is already the WAN gateway, SSH-triggered tcpdump is the lowest-friction capture method with zero hardware cost.

Why multi-source threat intel over a single provider? No single threat-intelligence source has complete coverage. VirusTotal excels at file/URL reputation but has API rate limits on the free tier. AbuseIPDB provides community-reported abuse data with good coverage of scanning and brute-force sources. IPinfo and ip-api.com provide geolocation and ASN context that helps distinguish a VPN exit node from a known-bad hosting provider. Layering multiple sources improves confidence and reduces blind spots.

Why Python over Go/Rust? The pipeline is I/O-bound (reading pcaps, making HTTP API calls) rather than CPU-bound. Python's scapy library provides robust pcap parsing, and the requests library handles threat-intel API calls cleanly. The performance ceiling for a home-network workload (hundreds of megabytes of pcap per week, not terabytes) does not justify the development-speed tradeoff of a compiled language.

Why systemd timer over cron? systemd timers provide built-in logging (journalctl), dependency ordering (wait for network-online.target), persistent timers that catch up after missed runs (if the analysis host was rebooted), and standardized status reporting (systemctl status). For a headless Raspberry Pi that may be power-cycled, these resilience features matter more than cron's simplicity.

5. Architecture Overview

Network Threat Pipeline Deployment Diagram
Deployment Diagram — EdgeRouter ↔ analysis host physical topology

The system follows a linear pipeline architecture with four stages: Capture, Analysis, Enrichment, and Reporting. The capture stage runs on the EdgeRouter via SSH-triggered tcpdump. The remaining three stages run on a separate analysis host (typically a Raspberry Pi or Linux workstation on the same LAN). An orchestration layer (run_pipeline.py) drives the end-to-end flow, and a systemd timer enables scheduled weekly execution.

6. Components

# Component File(s) Responsibility
1 Capture scripts start_capture.sh, stop_capture.sh, pull_captures.sh, check_capture.sh Remote tcpdump lifecycle over SSH: start a capture with configurable BPF filter and duration, stop it, pull the pcap to the analysis host, check if a capture is currently running.
2 Traffic analyzer analyze_traffic.py Parses pcap files using scapy and runs the four detection modules. Outputs a structured findings dictionary keyed by indicator (IP or domain) with detection type, confidence score, and supporting evidence.
3 Threat-intel enrichment threat_intel_lookup.py + threat_intel_config.json Queries configured threat-intel providers for each indicator. Merges reputation scores, geolocation, ASN ownership, and abuse reports into the findings. Handles rate limiting and provider unavailability gracefully.
4 Report generator generate_threat_summary.py Transforms enriched findings into a prioritized, human-readable Markdown report with severity tiers, indicator details, and recommended actions.
5 Pipeline orchestrator run_pipeline.py Drives the end-to-end flow: capture → analyze → enrich → report. Accepts CLI flags for mode (--capture, --analyze-only, --full), pcap input path, and output directory.
6 Scheduling wrapper run_weekly_scan.sh + network-threat-scan.timer / .service systemd timer unit that triggers the full pipeline weekly. The shell wrapper handles logging, lock-file management, and error notification.
7 Environment setup setup_analysis.sh Installs Python dependencies (scapy, requests, dnspython), validates SSH access to the EdgeRouter, and creates the output directory structure.

7. Detection Methods

Beaconing detection identifies periodic callbacks characteristic of command-and-control (C2) channels. The module groups outbound connections by destination IP, computes the inter-arrival time (IAT) distribution for each, and flags destinations where the coefficient of variation of IAT is below 0.3 (i.e., highly regular intervals) and the connection count exceeds 20 within the capture window. A secondary check looks for consistent payload sizes across connections to the same destination, which is a strong C2 indicator when combined with periodic timing.

DNS tunneling detection flags DNS queries that exhibit characteristics of data exfiltration or covert channels. The module computes the Shannon entropy of each queried subdomain label; labels with entropy above 3.5 bits per character are flagged, since legitimate subdomains (e.g., www, api, cdn) have low entropy while encoded data tunnels produce near-random character distributions. Additionally, queries with subdomain labels exceeding 40 characters or query rates to a single domain exceeding 50 queries per minute trigger detection, as both patterns are consistent with DNS tunneling tools like iodine and dnscat2.

Long-lived connection detection surfaces persistent TCP sessions that warrant review. Connections with a duration exceeding 3,600 seconds (1 hour) and that remain active (no FIN/RST observed within the capture window) are flagged. The module excludes well-known long-lived services (NTP, BGP, established VPN endpoints configured in a whitelist) to reduce noise. Remaining long-lived connections are ranked by total bytes transferred, since a persistent session with significant data transfer is more concerning than an idle keepalive.

Port-scan detection identifies horizontal and vertical scan patterns. The module flags source IPs that connect to more than 15 distinct destination ports on a single host (vertical scan) or that connect to the same port across more than 10 distinct destination hosts within a 60-second window (horizontal scan). SYN-only connections (no completed handshake) are weighted more heavily, as they indicate SYN scanning rather than legitimate connection attempts.

Each detection module produces a list of indicators (IPs or domains) with a detection type tag, a confidence score (0.0–1.0), and supporting evidence (e.g., IAT statistics, entropy values, connection counts). These feed the enrichment stage.

8. Data Flow

Network Threat Pipeline Data Flow Diagram
Data Flow Diagram — how each detection module processes pcap data
  1. Capture. The operator (or systemd timer) triggers start_capture.sh, which SSHs into the EdgeRouter and starts tcpdump with a configurable BPF filter (default: all WAN-interface traffic, excluding the SSH management session itself). After the configured duration (default: 30 minutes), stop_capture.sh terminates the capture and pull_captures.sh SCPs the pcap file to the analysis host's captures/ directory.
  2. Parsing. analyze_traffic.py reads the pcap using scapy's rdpcap() / PcapReader() (streaming mode for large files). It extracts connection metadata: source/destination IP, port, protocol, timestamps, payload sizes, DNS query names, and TCP flags.
  3. Detection. The four detection modules process the parsed connection data in sequence. Each module operates on the full connection set and produces independent findings. There is no dependency between modules — beaconing detection does not influence DNS tunneling detection, etc. The combined findings are deduplicated by indicator (if an IP is flagged by both beaconing and long-lived-connection detectors, both detections are preserved under the same indicator entry).
  4. Enrichment. threat_intel_lookup.py iterates over each unique indicator and queries the configured threat-intel sources. Responses are merged into the findings dictionary. Rate-limited or unavailable sources are skipped with a note in the finding's metadata, preserving partial enrichment rather than failing the pipeline.
  5. Reporting. generate_threat_summary.py sorts findings by a composite severity score (detection confidence × threat-intel reputation) and generates a Markdown report grouped by severity tier: CRITICAL (known-malicious + high-confidence detection), WARNING (suspicious + moderate confidence), and INFO (low confidence or insufficient intel). Each finding includes the indicator, detection type(s), confidence, threat-intel summary, and a recommended action.

9. Data Model

The pipeline operates on file-based, transient data rather than a persistent database. Key data entities:

Pcap files — raw packet captures stored in captures/ with timestamped filenames (e.g., capture_2026-06-08_1430.pcap). Retained for 30 days by default (configurable); older captures are rotated by the scheduling wrapper.

Findings dictionary — the central data structure, keyed by indicator (IP address or domain). Each entry contains: indicator type (IP/domain), detection types (list of module names that flagged it), confidence scores per detection, supporting evidence (IAT stats, entropy values, connection counts, byte totals), and a threat-intel enrichment sub-dictionary (reputation scores, geolocation, ASN, abuse reports) populated during the enrichment stage.

Threat-intel configurationthreat_intel_config.json maps provider names to their API endpoints, authentication (API key reference or "none"), rate-limit parameters, and response-field mappings. Adding a provider requires only a new entry in this file — no code changes.

Reports — Markdown files stored in reports/ with timestamped filenames. Each report is self-contained: header metadata (capture source, duration, timestamp, pipeline version), summary statistics (total connections analyzed, indicators flagged, severity distribution), and the detailed findings by tier.

10. External Interfaces

Source Endpoint / URL Purpose Auth Rate Limit
VirusTotal api.virustotal.com/v3/ip_addresses/{ip} IP reputation, detection ratio, community votes API key (free tier) 4 req/min, 500 req/day
AbuseIPDB api.abuseipdb.com/api/v2/check Abuse confidence score, report count, ISP/usage type API key (free tier) 1,000 req/day
IPinfo ipinfo.io/{ip}/json Geolocation, ASN, organization, privacy detection (VPN/proxy/Tor) API token (free tier) 50,000 req/month
ip-api.com ip-api.com/json/{ip} Geolocation, ISP, AS number (fallback for IPinfo rate limits) None (free for non-commercial) 45 req/min
EdgeRouter SSH (port 22) tcpdump start/stop/pull via shell scripts SSH key authentication N/A

All threat-intel calls are read-only GET requests. No data is submitted to external providers — only IP addresses and domains are queried.

11. Error Handling & Resilience

SSH connectivity failure. If the analysis host cannot reach the EdgeRouter via SSH (network issue, router reboot), the capture scripts exit with a non-zero status and the orchestrator logs the failure. The pipeline can still run in --analyze-only mode against previously captured pcaps, decoupling analysis from capture availability.

Threat-intel provider unavailability. Each provider call is wrapped in a try/except with a configurable timeout (default: 10 seconds). If a provider returns an error or times out, the enrichment stage logs the failure and continues with the remaining providers. A finding enriched by 3 of 4 providers is more useful than a pipeline that fails because one API is down. The report notes which providers were unavailable for each finding.

Rate-limit management. The enrichment module tracks per-provider request counts and pauses when approaching rate limits. For VirusTotal's aggressive 4-req/min limit, the module batches queries with a 15-second inter-request delay. If the daily quota is exhausted, remaining indicators are enriched with the other providers only, and the report notes the partial enrichment.

Malformed pcap handling. scapy's PcapReader can encounter truncated or corrupted packets. The parser wraps each packet read in a try/except, logs malformed packets (count and byte offset), and continues processing. A malformed-packet rate exceeding 10% triggers a warning in the report header, suggesting the capture may have been interrupted or the BPF filter may need adjustment.

Lock-file contention. The scheduling wrapper uses a lock file (/tmp/network-threat-pipeline.lock) to prevent overlapping runs. If a previous run is still active when the timer fires, the new run exits cleanly and logs a "skipped — previous run still active" message.

12. False-Positive Management & Alert Triage

Detection heuristics inevitably produce false positives. The pipeline addresses this at three levels:

Whitelisting. A whitelist.json file defines indicators (IPs, domains, CIDR ranges) and connection patterns that should be excluded from detection. Common entries include the operator's VPN endpoints (which would otherwise trigger long-lived connection and beaconing alerts), DNS resolvers (which generate high query volumes), and NTP servers (which produce periodic traffic resembling beaconing). The whitelist is checked before enrichment to avoid wasting API quota on known-good traffic.

Confidence scoring. Each detection module produces a confidence score (0.0–1.0) rather than a binary flag. The severity tiers in the report (CRITICAL / WARNING / INFO) are driven by the combination of detection confidence and threat-intel reputation. An IP with beaconing behavior but a clean VirusTotal record lands in INFO, not CRITICAL. Over time, the operator tunes detection thresholds (IAT coefficient of variation, entropy cutoff, connection-count minimums) based on their network's baseline.

Historical baselining. The pipeline maintains a baseline.json file that records indicators seen in previous runs with their detection types and outcomes (confirmed threat / false positive / unresolved). On subsequent runs, previously seen indicators are annotated with their history, allowing the operator to quickly dismiss recurring false positives and focus on new findings.

13. Sample Report Format

# Network Threat Summary
**Capture:** capture_2026-06-08_1430.pcap
**Duration:** 30 minutes | **Source:** EdgeRouter eth0 (WAN)
**Generated:** 2026-06-08 15:15 UTC | **Pipeline:** v1.2.0

## Summary
- Connections analyzed: 14,832
- Unique external IPs: 1,247
- Indicators flagged: 12
- Severity: 1 CRITICAL · 3 WARNING · 8 INFO

---

## CRITICAL

### 203.0.113.47 — Beaconing + Known C2
- **Detection:** Beaconing (confidence: 0.92) — IAT CV: 0.08,
  47 connections at ~60s intervals, consistent 128-byte payloads
- **Threat Intel:** VirusTotal 14/89 detections, AbuseIPDB
  confidence 97%, reported 312 times (last 30 days)
- **Geolocation:** Hosting provider, Frankfurt DE, AS24940
- **Action:** Block immediately; investigate internal host
  192.168.1.105 for compromise indicators

## WARNING
...

## INFO
...

14. Non-Functional Requirements (Measured)

NFR Target Basis
Pcap processing throughput 500 MB pcap in < 10 min Measured on Raspberry Pi 4 (4 GB RAM) with scapy streaming parser
Enrichment latency < 5 min for 50 unique indicators Bounded by VirusTotal rate limit (4 req/min); parallelized across other providers
False-positive rate < 20% of flagged indicators after tuning Measured over 8 weekly scans with whitelist and baseline active
Capture-to-report time < 60 min (30 min capture + 15 min analysis + 15 min enrichment) End-to-end for a standard weekly scan
Storage footprint < 2 GB for 30 days of captures + reports Based on 500 MB/week captures with 30-day rotation
Availability Best-effort; graceful degradation on any single component failure No SLA — personal infrastructure
Portability Any Linux host with Python 3.9+ and SSH access to a pcap source Tested on Raspberry Pi OS, Ubuntu 22.04, macOS (analysis-only mode)

15. Tech Stack

Layer Technology Role
Packet capture tcpdump (EdgeRouter built-in) WAN traffic capture with BPF filtering
Pcap parsing Python + scapy Packet-level parsing and connection metadata extraction
Detection engine Python (custom modules) Four heuristic detection modules: beaconing, DNS tunneling, long-lived connections, port scanning
DNS analysis dnspython DNS query parsing and entropy calculation for tunneling detection
Threat-intel enrichment Python + requests Multi-source API queries (VirusTotal, AbuseIPDB, IPinfo, ip-api.com)
Configuration JSON threat_intel_config.json (provider definitions), config.json (router/SSH settings), whitelist.json, baseline.json
Report generation Python (Markdown output) Structured threat summary with severity tiers
Remote access SSH (key-based) EdgeRouter capture control and pcap retrieval
Scheduling systemd timer + service units Weekly automated scan execution with persistent timers
Orchestration Python CLI (run_pipeline.py) + shell wrapper End-to-end pipeline control, logging, lock-file management

16. Security & Compliance

The pipeline operates on traffic the operator owns and controls. Capture access is via SSH key authentication to the operator's EdgeRouter — no password-based authentication is permitted. SSH keys are stored with 600 permissions on the analysis host; the private key is not passphrase-protected (tradeoff for unattended scheduled execution, mitigated by the analysis host being on a private LAN segment).

Threat-intel API keys are stored in environment variables or a .env file excluded from version control (.gitignore). No traffic content (packet payloads) is sent to external services — only IP addresses and domain names are queried against threat-intel APIs.

Pcap files contain raw network traffic, which may include sensitive data. Captures are stored on the analysis host's local filesystem with 600 permissions and rotated after 30 days. The analysis host should be treated as a sensitive system with access restricted to the operator.

The pipeline does not modify EdgeRouter configuration — it only starts/stops tcpdump and retrieves pcap files. Firewall rules remain the operator's responsibility; the pipeline provides intelligence, not enforcement. (For automated enforcement, see the OFAC Deny List project, which handles the firewall-rule lifecycle.)

17. Deployment & Operations

Initial setup. Run setup_analysis.sh on the analysis host to install Python dependencies, validate SSH key access to the EdgeRouter, and create the directory structure (captures/, reports/, logs/). Configure config.json with the EdgeRouter's IP, SSH user, WAN interface name, and capture duration. Add threat-intel API keys to threat_intel_config.json.

Ad-hoc analysis. Run python run_pipeline.py --full for a complete capture-analyze-enrich-report cycle, or python run_pipeline.py --analyze-only --pcap /path/to/file.pcap to analyze an externally supplied pcap.

Scheduled scans. Install the systemd timer: sudo cp network-threat-scan.timer network-threat-scan.service /etc/systemd/system/ && sudo systemctl enable --now network-threat-scan.timer. The timer fires every Sunday at 02:00 local time by default (configurable in the .timer unit). The service unit runs run_weekly_scan.sh, which acquires the lock file, executes the full pipeline, rotates old captures, and logs the outcome.

Data retention. Pcap files are retained for 30 days; reports are retained indefinitely (they are small Markdown files). The scheduling wrapper's rotation step deletes captures older than the configured retention period. Baseline and whitelist files are persistent and should be version-controlled alongside the pipeline code.

18. Cross-Project Context

OFAC Deny List (shared infrastructure). The Network Threat Pipeline and the OFAC Deny List project share the same physical infrastructure: both target the same Ubiquiti EdgeRouter and run from the same analysis host. They serve complementary purposes — the threat pipeline provides detection (what is the network talking to that looks suspicious?) while the OFAC Deny List provides prevention (block traffic to/from sanctioned networks before it happens). The two pipelines do not share code, but their outputs can inform each other: a threat-pipeline finding showing repeated connections to an IP range within a sanctioned country's allocation could prompt adding that range to the OFAC deny list's entity watchlist.

Shared SSH access pattern. Both projects use the same SSH key and access pattern to the EdgeRouter. Configuration of the router's SSH access (key deployment, user permissions, interface binding) is a shared dependency. Changes to the EdgeRouter's SSH configuration affect both pipelines.

Potential integration. A future integration point would feed high-confidence threat-pipeline findings (CRITICAL tier) into the OFAC Deny List's entity-resolution pipeline, enabling automated deny-rule generation for confirmed threats — not just sanctions-based blocking. This would require a shared indicator format and a review/approval step to prevent automated over-blocking.

19. Risks, Assumptions & Limitations

  • Detection is heuristic-based; false positives are expected and require tuning to the operator's network baseline. The confidence-scoring and whitelist mechanisms mitigate but do not eliminate this.
  • Capture fidelity depends on EdgeRouter placement and BPF filter configuration. Asymmetric routing or traffic handled by other network paths will not be captured.
  • Threat-intel quality depends on the configured sources and their free-tier limitations. VirusTotal's 4 req/min and 500 req/day limits constrain enrichment throughput for large indicator sets.
  • scapy's in-memory pcap parsing has a practical ceiling around 1 GB per file on a 4 GB Raspberry Pi. Larger captures should be split or the streaming parser used with increased processing time.
  • The pipeline assumes the analysis host is on the same LAN as the EdgeRouter and can reach it via SSH. Remote analysis over the WAN would require VPN or port-forwarding configuration outside the pipeline's scope.
  • No real-time alerting — the pipeline runs on a batch/scheduled basis. A C2 beacon active between scans will not be detected until the next capture window.

20. Roadmap

Phase 1 — Core pipeline (current). Capture, detection, enrichment, and reporting for the four core detection types. Scheduled weekly scans with systemd timer. Manual threshold tuning and whitelist management.

Phase 2 — Adaptive thresholds. Implement statistical baselining that automatically adjusts detection thresholds based on the network's traffic profile over time. A 30-day rolling baseline would shift the beaconing IAT threshold and DNS query-rate threshold to reduce false positives without manual tuning.

Phase 3 — Real-time mode. Add a streaming capture mode using tcpdump piped through SSH in real time, with detection modules processing packets as they arrive rather than in batch. This would enable near-real-time alerting (e.g., push notification via ntfy.sh or a Slack webhook) for CRITICAL-tier findings.

Phase 4 — OFAC integration. Feed confirmed threats into the OFAC Deny List pipeline for automated deny-rule generation, with a review queue to prevent over-blocking. This closes the loop from detection to enforcement.


Diagrams: Sequence Diagram · Data Flow Diagram · Deployment Diagram