Output Formats, Data Management, and Integration Workflows
The Output Formats: Structural Foundations for Programmatic Consumption
Nmap's output is not merely human-readable telemetry—it is structured data that determines whether your scans integrate cleanly into security operations or become manual parsing burdens. Treat format selection as an architectural decision.
Normal format (-oN) preserves the standard terminal display with runtime commentary, port tables, and OS guesses. It remains useful for ad-hoc investigations and evidence archives but offers no structured parsing path. Use it for human reference only.
XML format (-oX) is the canonical choice for integrations. It validates against Nmap's Document Type Definition and provides complete scan fidelity. The root <nmaprun> element contains <host> children, each with:
<address addr="..." addrtype="ipv4|ipv6|mac"/>for network layer identification<hostnames>with nested<hostname>elements for DNS associations<ports>containing<port protocol="tcp|udp" portid="...">with<state state="open|closed|filtered|unfiltered" reason="..."/>and optional<service name="..." product="..." version="..." extrainfo="..."/>children<os>with<osmatch>fingerprints ranked by accuracy<hostscript>for NSE output, nested identically to<port><script>elements
The XML DTD ensures predictable traversal. XPath queries extract specific intelligence without fragile string matching:
//host[ports/port[state/@state='open' and portid='22']]/address/@addr
This returns all IPs with SSH accessible. For web service enumeration:
//port[service/@name='http' or service/@name='https']/ancestor::host/address/@addr
Grepable format (-oG) presents slash-delimited records optimized for grep and awk pipelines. However, it is deprecated for new integrations: it truncates service version strings, omits NSE script output entirely, cannot represent IPv6 addresses cleanly, and lacks hierarchical nesting. Legacy tools may require it, but modern pipelines should prefer XML.
Script Kiddie format (-oS) is a novelty output in "l33t sp34k." It has zero operational value.
Python Parsing and the XML Object Model
For production parsing, python-libnmap abstracts the DTD into queryable objects, or use ElementTree for zero-dependency handling:
import xml.etree.ElementTree as ET
def extract_vulnerable_hosts(xml_path):
tree = ET.parse(xml_path)
for host in tree.findall('.//host'):
addr = host.find('address').get('addr')
for port in host.findall('.//port'):
portid = port.get('portid')
state = port.find('state').get('state')
service = port.find('service')
if service is not None:
name = service.get('name', 'unknown')
version = service.get('version', '')
# CVE correlation logic here
if 'outdated' in version.lower():
yield {
'ip': addr,
'port': portid,
'service': name,
'version': version
}
For massive scan repositories, stream-parse with xml.iterparse to bound memory consumption.
Diffing Scans with ndiff: Change Detection at Scale
Network baselines decay immediately. ndiff compares two XML outputs and produces structured deltas:
ndiff baseline-scan.xml current-scan.xml > network-delta.txt
The output uses + for additions, - for removals, and contextual headers for host modifications. Critical for SOC interpretation:
| Pattern | Security Significance | |--------|----------------------| | + Nmap 7.94 scan initiated | New host appeared (rogue device, shadow IT) | | - 22/tcp open ssh | Service removed or filtered (hardening or compromise) | | + 8080/tcp open http Apache Tomcat 9.0.80 | New service deployment (patch validation needed) | | - 80/tcp open http + 443/tcp open ssl/http | Service migration (TLS enforcement check) |
Baseline establishment requires representative scanning: identical flags, timing templates, and network conditions. Store baselines in version control. Change detection alerting feeds ndiff output into monitoring systems—non-zero exit codes signal drift. Regression analysis correlates delta timing with change management tickets; unexpected deltas trigger incident response.
Verbosity levels (-v, -vv, -d, -dd) require cost engineering. Each increment expands log volume exponentially. At -vvv, every packet timing decision is logged—valuable for debugging, ruinous for retention. Standardize: -v for routine operations, -d only for scan failure investigation, with automated retention policies tiering raw logs to cold storage.
Conversion Pipelines and Data Transformation
Raw XML rarely feeds directly into analytics platforms. Three conversion paths dominate:
xsltproc transformations render HTML reports for executive consumption. Nmap ships with nmap.xsl:
xsltproc nmap.xsl scan.xml > report.html
Custom XSL transforms can generate CSV, JSON, or markdown for ticket systems.
python-libnmap exports to JSON, CSV, or SQLite directly:
from libnmap.parser import NmapParser
from libnmap.reportjson import ReportEncoder
import json
report = NmapParser.parse_fromfile('scan.xml')
json_output = json.dumps(report, cls=ReportEncoder)
Database ingestion pipelines using nmapdb or custom ETL normalize scan data into relational schemas. A typical PostgreSQL ingestion:
CREATE TABLE scan_results (
scan_id UUID,
timestamp TIMESTAMPTZ,
host_ip INET,
port INT,
protocol TEXT,
state TEXT,
service_name TEXT,
service_version TEXT,
os_match TEXT,
nse_output JSONB
);
The JSONB column accommodates extensible NSE output without schema migrations.
Metasploit Integration: From Discovery to Exploitation
The db_nmap command executes Nmap with results automatically imported into Metasploit's PostgreSQL backend:
msf6 > workspace -a client-engagement-2024
msf6 > db_nmap -sV -O --script vuln 10.0.0.0/24
Workspace isolation prevents cross-contamination between engagements. Post-scan, hosts, services, vulns, and loot commands query the database. Automatic exploit correlation occurs through analyze:
msf6 > hosts -c address,os_flavor
msf6 > services -p 445 --rhosts
msf6 > vulns
msf6 > analyze
The analyze command matches vulns entries and services versions against exploit modules, ranking by reliability and target compatibility. This closed loop—from discovery to exploit suggestion—compresses reconnaissance timelines but demands rigorous scope validation.
Continuous Scanning Architectures
Transforming Nmap from interactive tool to production infrastructure requires containerization, scheduling, and API abstraction.
Containerized execution ensures reproducible environments:
FROM alpine:latest
RUN apk add --no-cache nmap nmap-scripts
ENTRYPOINT ["nmap"]
Scan configurations mount as ConfigMaps; results write to persistent volumes or object storage.
Kubernetes CronJobs enable periodic discovery with declarative scheduling:
apiVersion: batch/v1
kind: CronJob
metadata:
name: network-discovery
spec:
schedule: "0 2 * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: nmap
image: nmap:7.94
command:
- nmap
- -sS
- -p-
- -oX
- /results/scan-$(date +%Y%m%d).xml
- 10.0.0.0/24
volumeMounts:
- name: results
mountPath: /results
restartPolicy: OnFailure
Elastic Stack visualization ingests XML through Logstash parsing:
# logstash.conf filter section
xml {
source => "message"
target => "nmap"
xpath => [
"/nmaprun/host/address/@addr", "[nmap][host][ip]",
"/nmaprun/host/ports/port/state/@state", "[nmap][port][state]"
]
}
Kibana dashboards track attack surface trends: open port counts by subnet, service version age distributions, SSL certificate expiry heatmaps.
CI/CD security gates execute Nmap against deployed infrastructure before promotion. Fail pipelines on unexpected listening services or prohibited version disclosures. GitHub Actions example:
- name: Verify Staging Surface
run: |
nmap -sS -p 1-65535 -oX staging-scan.xml ${{ env.STAGING_IP }}
python scripts/validate_baseline.py staging-scan.xml baselines/staging.xml
API-driven scan initiation via REST wrappers (e.g., nmap-api, custom Flask services) decouples scan scheduling from execution. Clients POST target specifications; workers queue scans, stream results through WebSockets, and persist to backends. This pattern enables multi-tenant scanning platforms with role-based access and audit logging.
The operational maturity progression moves from file-based output (-oX to disk) through database normalization (Metasploit, custom PostgreSQL) to event-stream architectures (Kafka topics per subnet, real-time anomaly detection). At each stage, the XML DTD's structural completeness prevents data loss that grepable truncation or normal format ambiguity would introduce.