Output Parsing, Integration, and Continuous Monitoring Workflows

Parsing Nmap XML Output Programmatically

Nmap's XML output (-oX) is the only format sufficiently structured for reliable automation. The schema is straightforward: a root nmaprun element containing host nodes, each with address, hostnames, ports, and os children. For Python, you have two practical paths: the python-nmap library for direct scan orchestration with object access, or libnmap (which includes parser, report, and MongoDB/Elastic backends) for post-processing existing files. When you only need to parse—especially in a CI worker that didn't execute the scan—skip the wrapper and use the standard library.

Here is a production-hardened parser that extracts newly opened ports by diffing against a previous scan baseline:

#!/usr/bin/env python3
"""Extract new open ports from Nmap XML compared to a baseline."""
import xml.etree.ElementTree as ET
from pathlib import Path
from datetime import datetime
import sys

def parse_ports(xml_path):
    """Return dict: {(ip, port, proto): service_banner}."""
    ports = {}
    tree = ET.parse(xml_path)
    for host in tree.findall('host'):
        status = host.find('status')
        if status is None or status.get('state') != 'up':
            continue
        ip = host.find('address').get('addr')
        ports_elem = host.find('ports')
        if ports_elem is None:
            continue
        for port in ports_elem.findall('port'):
            if port.find('state').get('state') != 'open':
                continue
            portid = port.get('portid')
            proto = port.get('protocol')
            service = port.find('service')
            banner = ''
            if service is not None:
                banner = service.get('name', '')
                if service.get('product'):
                    banner += f" {service.get('product')}"
                if service.get('version'):
                    banner += f" {service.get('version')}"
            ports[(ip, portid, proto)] = banner.strip()
    return ports

def diff_scans(baseline_path, current_path):
    baseline = parse_ports(baseline_path)
    current = parse_ports(current_path)
    new = {k: v for k, v in current.items() if k not in baseline}
    closed = {k: v for k, v in baseline.items() if k not in current}
    return new, closed

if __name__ == '__main__':
    new, closed = diff_scans(sys.argv[1], sys.argv[2])
    print(f";; Delta report generated {datetime.utcnow().isoformat()}Z")
    for (ip, port, proto), banner in sorted(new):
        print(f"[NEW] {ip}:{port}/{proto} {banner}")
    for (ip, port, proto), banner in sorted(closed):
        print(f"[CLOSED] {ip}:{port}/{proto} {banner}")

What it does: Compares two Nmap XML files, reporting ports that appeared or disappeared. When to use it: Nightly cron jobs, CI gates, or incident-response triage when you need machine-actionable deltas. Risks: XML without --service-version produces empty banners; always pair with -sV for meaningful diffs. Expected output: Lines prefixed [NEW] or [CLOSED] with IP, port, protocol, and service fingerprint.

The python-nmap library wraps the binary and exposes results as dictionaries; libnmap offers more sophisticated reporting objects and built-in serialization. Both rely on the same underlying XML. For Go or Rust, generate structs from the schema with xsdgen or serde—the community has published several correct implementations.

Differential Scanning with Ndiff

For ad-hoc comparisons without writing code, Nmap ships with ndiff, a semantic differ that understands scan logic rather than performing naive text diffing. It ignores timestamp and runtime noise, concentrating on host state, port state, and OS changes.

# Lab: Weekly comparison of internal lab segment
ndiff /scans/baseline-192.0.2.0-24.xml /scans/weekly-192.0.2.0-24.xml

# Production: Lower-intensity scan, same diff workflow
nmap -sS -p- --max-rate 100 --max-retries 2 -oX /scans/prod-weekly.xml 192.0.2.0/24
ndiff /scans/baseline-prod.xml /scans/prod-weekly.xml

What it does: Compares two Nmap XML files and outputs only meaningful changes in a human-readable format. When to use it: Weekly operational reviews, change-control validation, or after maintenance windows to catch unintended exposure. Risks: ndiff does not alert on missing hosts that failed to respond; down hosts vanish silently from both reports. Expected output: + and - prefixed lines showing gained or lost services; = for unchanged elements.

Realistic ndiff output:

- Nmap 7.94 scan initiated Mon Jan 15 06:00:00 2024 as: nmap -sS -sV -p- -oX baseline.xml 192.0.2.0/24
+ Nmap 7.94 scan initiated Mon Jan 22 06:00:00 2024 as: nmap -sS -sV -p- -oX weekly.xml 192.0.2.0/24

  192.0.2.10:
+   8080/tcp open  http    Apache Tomcat 9.0.82
-   3306/tcp open  mysql   MySQL 8.0.34

  192.0.2.55:
+   Host is up.
+   22/tcp open  ssh     OpenSSH 9.3p1

The + 8080/tcp line is your signal: either a deployment occurred without change ticket, or an unauthorized service is running. The - 3306/tcp is equally important—service disappearance can indicate compromise response (attacker covering tracks) or a misconfiguration that broke a dependency.

Database Integration and Trending

Storing scans in PostgreSQL enables longitudinal analysis: "Show me all hosts where port 3389/RDP appeared in the last 90 days" or "Count exposed SMB instances by subnet over time." A minimal schema:

CREATE TABLE scans (
    scan_id SERIAL PRIMARY KEY,
    started_at TIMESTAMPTZ NOT NULL,
    target_cidr CIDR,
    nmap_version TEXT,
    xml_checksum BYTEA UNIQUE
);

CREATE TABLE hosts (
    host_id BIGSERIAL PRIMARY KEY,
    scan_id INT REFERENCES scans(scan_id),
    ip INET NOT NULL,
    mac TEXT,
    hostname TEXT,
    os_guess TEXT,
    state TEXT CHECK (state IN ('up', 'down', 'unknown'))
);

CREATE TABLE ports (
    port_id BIGSERIAL PRIMARY KEY,
    host_id BIGINT REFERENCES hosts(host_id),
    port INT,
    protocol TEXT,
    state TEXT,
    service_name TEXT,
    product TEXT,
    version TEXT,
    extrainfo TEXT
);

CREATE INDEX ON ports(port, protocol, state) WHERE state = 'open';
CREATE INDEX ON hosts(ip, scan_id);

Import via COPY from a CSV produced by your Python parser, or use xml2 PostgreSQL extension for direct XML shredding. Partition scans by started_at monthly; scan archives grow fast.

Continuous Scanning Architecture

┌─────────────┐     ┌─────────────┐     ┌─────────────────┐
│  Scheduler  │────▶│  Scan Jobs  │────▶│  Nmap Workers   │
│  (cron/     │     │  (temporal/  │     │  (dedicated     │
│   Airflow)  │     │   ephemeral) │     │   network seg)  │
└─────────────┘     └─────────────┘     └─────────────────┘
                                               │
                                               ▼
                                        ┌─────────────┐
                                        │  XML Output │
                                        │  (S3/nfs)   │
                                        └─────────────┘
                                               │
                    ┌──────────────────────────┼──────────────────────────┐
                    ▼                          ▼                          ▼
              ┌─────────┐              ┌─────────────┐              ┌─────────────┐
              │ Parser  │              │   Ndiff     │              │  Zenmap/    │
              │ (Python)│              │   Engine    │              │  Faraday    │
              └────┬────┘              └──────┬──────┘              └─────────────┘
                   │                          │
                   ▼                          ▼
              ┌─────────┐              ┌─────────────┐
              │PostgreSQL│             │  Alerting   │
              │(trends)  │             │  (PagerDuty │
              └─────────┘              │  /Slack/API) │
                                       └─────────────┘

Key operational decisions in this pipeline:

Decision	Practical implication
Scan rate from CI	Baseline scans every deployment; full sweeps weekly. Too frequent and you desensitize responders; too sparse and drift accumulates.
Worker segmentation	Run Nmap from a dedicated NIC/VLAN with explicit firewall rules. A compromised worker is an attacker goldmine.
XML retention	Raw XML is 10-50× compressed DB rows. Keep 90 days hot, glacier archive for compliance period.
Checksum deduplication	Identical XMLs (no network changes) skip parsing; saves I/O, reveals stagnant infrastructure.

CI/CD Pipeline Integration

Embed baseline scanning in deployment pipelines to catch infrastructure-as-code drift before it reaches production. A GitLab CI example:

network-baseline:
  image: nmap:latest  # pin digest, not tag
  variables:
    TARGET: "198.51.100.0/24"
    RATE: "500"  # Production: reduce to 100 or less
  script:
    - nmap -sS -sV -p- --max-rate $RATE -oX scan-$(date +%s).xml $TARGET
    - python3 /scripts/parse_and_alert.py scan-*.xml --baseline /baselines/prod.xml
  artifacts:
    paths: ["scan-*.xml"]
    expire_in: 30 days
  rules:
    - if: $CI_PIPELINE_SOURCE == "schedule"

What it does: Scheduled pipeline executes Nmap, parses results, and fails if new ports appear against baseline. When to use it: Every deployment to network-adjacent infrastructure, or nightly for static environments. Risks: Pipeline failures from network jitter cause alert fatigue; implement retry with backoff and threshold-based alerting (e.g., 3 consecutive deltas). Expected output: CI job log with parsed new ports, or green pass if baseline matches.

The parser should emit exit code 2 on drift, exit code 0 on match, and exit code 1 on scan failure—standard Nagios conventions that most CI systems and monitoring hooks understand natively.

Visualization and Third-Party Integration

Zenmap's topology view is adequate for single-network comprehension but does not scale past a few hundred hosts. For operational dashboards, export to:

Tool	Role	Integration path
Faraday	Collaborative pentest workspace	Upload XML via API; correlates with exploit findings
Dradis	Report generation	Import XML as evidence; templates for executive summaries
Grafana	Time-series dashboards	PostgreSQL backend with port-exposure queries
nmap-vulners	Vulnerability context	`--script nmap-vulners` enriches port data with CVE references at scan time

# Lab: Enrich scan with vulners script for immediate triage context
nmap -sV --script nmap-vulners -p 22,80,443 -oX vuln-enriched.xml 192.0.2.0/24

What it does: Queries Vulners API for CVEs associated with detected service versions. When to use it: Prioritization of exposed services; never as sole vulnerability assessment. Risks: Version detection is probabilistic; false positives on backported patches common in enterprise Linux. Expected output: CVE list appended to each port's script output in XML.

RustScan as Acceleration Layer

For initial host discovery across large estates, RustScan's masscan-inspired speed with Nmap fallback improves pipeline throughput. It is not a replacement—service detection and scripting require Nmap proper—but it collapses the discovery phase from hours to minutes.

# Lab: RustScan for port discovery, Nmap for deep inspection
rustscan -a 192.0.2.0/24 --range 1-65535 --scan-order random \
  -- -sV -sC -oX deep.xml

The -- passes remaining arguments to Nmap. Production requires rate limiting: RustScan defaults are aggressive and will overwhelm stateful firewalls or trigger IDS thresholds.

Data Retention for Sensitive Topology Archives

Scan archives contain a complete network map—IP addressing, live hosts, service versions, OS guesses. Treat them as confidential at the sensitivity tier of your network documentation, not merely log data.

Practical policy elements:

Retention: 90 days online in PostgreSQL, 1-3 years compressed XML in object storage, then cryptographically shredded. Legal hold suspends deletion.
Access: Service account only for parser; human access requires break-glass with ticket reference logged.
Encryption: XML at rest (AES-256-GCM via S3 or filesystem encryption); TLS 1.3 for parser-to-database transport.
Geographic: Store in same jurisdiction as network; cross-border transfer of infrastructure maps may violate data residency or trigger export control review.

Hard-won insight: A port marked filtered is not a clean result—it is an unanswered question. Many operators archive only open ports and miss the security-relevant signal of firewall rule changes. Store filtered and closed states; the delta from filtered to open without a change ticket is often your earliest intrusion indicator.