Output Parsing, Integration, and Continuous Monitoring Workflows
Parsing Nmap XML Output Programmatically
Nmap's XML output (-oX) is the only format sufficiently structured for reliable automation. The schema is straightforward: a root nmaprun element containing host nodes, each with address, hostnames, ports, and os children. For Python, you have two practical paths: the python-nmap library for direct scan orchestration with object access, or libnmap (which includes parser, report, and MongoDB/Elastic backends) for post-processing existing files. When you only need to parse—especially in a CI worker that didn't execute the scan—skip the wrapper and use the standard library.
Here is a production-hardened parser that extracts newly opened ports by diffing against a previous scan baseline:
#!/usr/bin/env python3
"""Extract new open ports from Nmap XML compared to a baseline."""
import xml.etree.ElementTree as ET
from pathlib import Path
from datetime import datetime
import sys
def parse_ports(xml_path):
"""Return dict: {(ip, port, proto): service_banner}."""
ports = {}
tree = ET.parse(xml_path)
for host in tree.findall('host'):
status = host.find('status')
if status is None or status.get('state') != 'up':
continue
ip = host.find('address').get('addr')
ports_elem = host.find('ports')
if ports_elem is None:
continue
for port in ports_elem.findall('port'):
if port.find('state').get('state') != 'open':
continue
portid = port.get('portid')
proto = port.get('protocol')
service = port.find('service')
banner = ''
if service is not None:
banner = service.get('name', '')
if service.get('product'):
banner += f" {service.get('product')}"
if service.get('version'):
banner += f" {service.get('version')}"
ports[(ip, portid, proto)] = banner.strip()
return ports
def diff_scans(baseline_path, current_path):
baseline = parse_ports(baseline_path)
current = parse_ports(current_path)
new = {k: v for k, v in current.items() if k not in baseline}
closed = {k: v for k, v in baseline.items() if k not in current}
return new, closed
if __name__ == '__main__':
new, closed = diff_scans(sys.argv[1], sys.argv[2])
print(f";; Delta report generated {datetime.utcnow().isoformat()}Z")
for (ip, port, proto), banner in sorted(new):
print(f"[NEW] {ip}:{port}/{proto} {banner}")
for (ip, port, proto), banner in sorted(closed):
print(f"[CLOSED] {ip}:{port}/{proto} {banner}")
What it does: Compares two Nmap XML files, reporting ports that appeared or disappeared. When to use it: Nightly cron jobs, CI gates, or incident-response triage when you need machine-actionable deltas. Risks: XML without
--service-versionproduces empty banners; always pair with-sVfor meaningful diffs. Expected output: Lines prefixed[NEW]or[CLOSED]with IP, port, protocol, and service fingerprint.
The python-nmap library wraps the binary and exposes results as dictionaries; libnmap offers more sophisticated reporting objects and built-in serialization. Both rely on the same underlying XML. For Go or Rust, generate structs from the schema with xsdgen or serde—the community has published several correct implementations.
Differential Scanning with Ndiff
For ad-hoc comparisons without writing code, Nmap ships with ndiff, a semantic differ that understands scan logic rather than performing naive text diffing. It ignores timestamp and runtime noise, concentrating on host state, port state, and OS changes.
# Lab: Weekly comparison of internal lab segment
ndiff /scans/baseline-192.0.2.0-24.xml /scans/weekly-192.0.2.0-24.xml
# Production: Lower-intensity scan, same diff workflow
nmap -sS -p- --max-rate 100 --max-retries 2 -oX /scans/prod-weekly.xml 192.0.2.0/24
ndiff /scans/baseline-prod.xml /scans/prod-weekly.xml
What it does: Compares two Nmap XML files and outputs only meaningful changes in a human-readable format. When to use it: Weekly operational reviews, change-control validation, or after maintenance windows to catch unintended exposure. Risks:
ndiffdoes not alert on missing hosts that failed to respond; down hosts vanish silently from both reports. Expected output:+and-prefixed lines showing gained or lost services;=for unchanged elements.
Realistic ndiff output:
- Nmap 7.94 scan initiated Mon Jan 15 06:00:00 2024 as: nmap -sS -sV -p- -oX baseline.xml 192.0.2.0/24
+ Nmap 7.94 scan initiated Mon Jan 22 06:00:00 2024 as: nmap -sS -sV -p- -oX weekly.xml 192.0.2.0/24
192.0.2.10:
+ 8080/tcp open http Apache Tomcat 9.0.82
- 3306/tcp open mysql MySQL 8.0.34
192.0.2.55:
+ Host is up.
+ 22/tcp open ssh OpenSSH 9.3p1
The + 8080/tcp line is your signal: either a deployment occurred without change ticket, or an unauthorized service is running. The - 3306/tcp is equally important—service disappearance can indicate compromise response (attacker covering tracks) or a misconfiguration that broke a dependency.
Database Integration and Trending
Storing scans in PostgreSQL enables longitudinal analysis: "Show me all hosts where port 3389/RDP appeared in the last 90 days" or "Count exposed SMB instances by subnet over time." A minimal schema:
CREATE TABLE scans (
scan_id SERIAL PRIMARY KEY,
started_at TIMESTAMPTZ NOT NULL,
target_cidr CIDR,
nmap_version TEXT,
xml_checksum BYTEA UNIQUE
);
CREATE TABLE hosts (
host_id BIGSERIAL PRIMARY KEY,
scan_id INT REFERENCES scans(scan_id),
ip INET NOT NULL,
mac TEXT,
hostname TEXT,
os_guess TEXT,
state TEXT CHECK (state IN ('up', 'down', 'unknown'))
);
CREATE TABLE ports (
port_id BIGSERIAL PRIMARY KEY,
host_id BIGINT REFERENCES hosts(host_id),
port INT,
protocol TEXT,
state TEXT,
service_name TEXT,
product TEXT,
version TEXT,
extrainfo TEXT
);
CREATE INDEX ON ports(port, protocol, state) WHERE state = 'open';
CREATE INDEX ON hosts(ip, scan_id);
Import via COPY from a CSV produced by your Python parser, or use xml2 PostgreSQL extension for direct XML shredding. Partition scans by started_at monthly; scan archives grow fast.
Continuous Scanning Architecture
┌─────────────┐ ┌─────────────┐ ┌─────────────────┐
│ Scheduler │────▶│ Scan Jobs │────▶│ Nmap Workers │
│ (cron/ │ │ (temporal/ │ │ (dedicated │
│ Airflow) │ │ ephemeral) │ │ network seg) │
└─────────────┘ └─────────────┘ └─────────────────┘
│
▼
┌─────────────┐
│ XML Output │
│ (S3/nfs) │
└─────────────┘
│
┌──────────────────────────┼──────────────────────────┐
▼ ▼ ▼
┌─────────┐ ┌─────────────┐ ┌─────────────┐
│ Parser │ │ Ndiff │ │ Zenmap/ │
│ (Python)│ │ Engine │ │ Faraday │
└────┬────┘ └──────┬──────┘ └─────────────┘
│ │
▼ ▼
┌─────────┐ ┌─────────────┐
│PostgreSQL│ │ Alerting │
│(trends) │ │ (PagerDuty │
└─────────┘ │ /Slack/API) │
└─────────────┘
Key operational decisions in this pipeline:
| Decision | Practical implication |
|---|---|
| Scan rate from CI | Baseline scans every deployment; full sweeps weekly. Too frequent and you desensitize responders; too sparse and drift accumulates. |
| Worker segmentation | Run Nmap from a dedicated NIC/VLAN with explicit firewall rules. A compromised worker is an attacker goldmine. |
| XML retention | Raw XML is 10-50× compressed DB rows. Keep 90 days hot, glacier archive for compliance period. |
| Checksum deduplication | Identical XMLs (no network changes) skip parsing; saves I/O, reveals stagnant infrastructure. |
CI/CD Pipeline Integration
Embed baseline scanning in deployment pipelines to catch infrastructure-as-code drift before it reaches production. A GitLab CI example:
network-baseline:
image: nmap:latest # pin digest, not tag
variables:
TARGET: "198.51.100.0/24"
RATE: "500" # Production: reduce to 100 or less
script:
- nmap -sS -sV -p- --max-rate $RATE -oX scan-$(date +%s).xml $TARGET
- python3 /scripts/parse_and_alert.py scan-*.xml --baseline /baselines/prod.xml
artifacts:
paths: ["scan-*.xml"]
expire_in: 30 days
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
What it does: Scheduled pipeline executes Nmap, parses results, and fails if new ports appear against baseline. When to use it: Every deployment to network-adjacent infrastructure, or nightly for static environments. Risks: Pipeline failures from network jitter cause alert fatigue; implement retry with backoff and threshold-based alerting (e.g., 3 consecutive deltas). Expected output: CI job log with parsed new ports, or green pass if baseline matches.
The parser should emit exit code 2 on drift, exit code 0 on match, and exit code 1 on scan failure—standard Nagios conventions that most CI systems and monitoring hooks understand natively.
Visualization and Third-Party Integration
Zenmap's topology view is adequate for single-network comprehension but does not scale past a few hundred hosts. For operational dashboards, export to:
| Tool | Role | Integration path |
|---|---|---|
| Faraday | Collaborative pentest workspace | Upload XML via API; correlates with exploit findings |
| Dradis | Report generation | Import XML as evidence; templates for executive summaries |
| Grafana | Time-series dashboards | PostgreSQL backend with port-exposure queries |
| nmap-vulners | Vulnerability context | --script nmap-vulners enriches port data with CVE references at scan time |
# Lab: Enrich scan with vulners script for immediate triage context
nmap -sV --script nmap-vulners -p 22,80,443 -oX vuln-enriched.xml 192.0.2.0/24
What it does: Queries Vulners API for CVEs associated with detected service versions. When to use it: Prioritization of exposed services; never as sole vulnerability assessment. Risks: Version detection is probabilistic; false positives on backported patches common in enterprise Linux. Expected output: CVE list appended to each port's script output in XML.
RustScan as Acceleration Layer
For initial host discovery across large estates, RustScan's masscan-inspired speed with Nmap fallback improves pipeline throughput. It is not a replacement—service detection and scripting require Nmap proper—but it collapses the discovery phase from hours to minutes.
# Lab: RustScan for port discovery, Nmap for deep inspection
rustscan -a 192.0.2.0/24 --range 1-65535 --scan-order random \
-- -sV -sC -oX deep.xml
The -- passes remaining arguments to Nmap. Production requires rate limiting: RustScan defaults are aggressive and will overwhelm stateful firewalls or trigger IDS thresholds.
Data Retention for Sensitive Topology Archives
Scan archives contain a complete network map—IP addressing, live hosts, service versions, OS guesses. Treat them as confidential at the sensitivity tier of your network documentation, not merely log data.
Practical policy elements:
- Retention: 90 days online in PostgreSQL, 1-3 years compressed XML in object storage, then cryptographically shredded. Legal hold suspends deletion.
- Access: Service account only for parser; human access requires break-glass with ticket reference logged.
- Encryption: XML at rest (AES-256-GCM via S3 or filesystem encryption); TLS 1.3 for parser-to-database transport.
- Geographic: Store in same jurisdiction as network; cross-border transfer of infrastructure maps may violate data residency or trigger export control review.
Hard-won insight: A port marked
filteredis not a clean result—it is an unanswered question. Many operators archive onlyopenports and miss the security-relevant signal of firewall rule changes. Storefilteredandclosedstates; the delta fromfilteredtoopenwithout a change ticket is often your earliest intrusion indicator.