Performance Tuning, Timing, and Large-Scale Scanning

Timing Templates: Beyond "Just Use -T4"

Nmap's six timing templates (-T0 through -T5) are often treated as a linear speed slider, but each modifies four independent variables with non-linear effects. Understanding the mechanics behind these presets is essential for performance engineering.

| Template | RTT Timeout | Parallel Probes | Scan Delay | Max Retries | |----------|-------------|-----------------|------------|-------------| | -T0 (Paranoid) | 5 min | 1 at a time | 5 min | 20 | | -T1 (Sneaky) | 15 sec | 1 at a time | 15 sec | 20 | | -T2 (Polite) | 1 sec | 1 at a time | 400 ms | 10 | | -T3 (Normal) | Dynamic | Dynamic | 0 | 10 | | -T4 (Aggressive) | Dynamic | Dynamic | 0 | 6 | | -T5 (Insane) | 75 ms | Dynamic | 0 | 2 |

The critical distinction lies in -T0 through -T2 enforcing serial probe transmission (one outstanding probe per host), while -T3 through -T5 enable parallelism. At -T4 and -T5, Nmap's adaptive algorithms activate aggressively, but -T5 caps initial RTT timeout at 75 ms regardless of measured network conditions—a recipe for massive retransmission on latent paths.

What each variable actually controls:

RTT timeout: Maximum wait for probe response before retransmission. Dynamic calculation uses exponential smoothing with timeouts derived from observed round-trip times.
Parallel probes: Outstanding probes per host, per scan phase. Higher values exploit network pipelining but increase state table pressure on the scanning host.
Scan delay: Explicit usleep() between probe batches. Defeats pattern recognition in IDS but linearly caps throughput.
Max retries: Retransmission ceiling. Each retry doubles effective timeout exposure for that port.

Adaptive Timing Algorithms: Nmap's Congestion Window

Nmap implements a congestion window-like mechanism for probe management, conceptually borrowed from TCP but adapted for stateless and stateful scan types. The engine maintains:

current_rate = min(slow_start_ceiling, network_capacity_estimate, user_max_rate)

The algorithm responds to three congestion signals:

Response arrival rate: Positive ACKs increase the probe window multiplicatively (slow start) then additively (congestion avoidance)
Timeout accumulation: Missing responses trigger exponential backoff of per-host timeout values
ICMP source quench/admin prohibited: Explicit network feedback reduces rate by 50% and logs the event

For SYN scans, Nmap tracks a separate timing state per host group, not per individual target. This aggregation enables efficient parallelization but means a single slow responder in a group throttles the entire batch. The nmamng.h source reveals the core data structure: struct timeout_info stores srtt (smoothed RTT), rttvar (variance), and timeout (calculated retransmission threshold).

The critical insight: Nmap's adaptation is reactive, not predictive. It cannot anticipate network topology changes or time-of-day congestion patterns. For scheduled scans across variable paths, pre-seed with --initial-rtt-timeout based on prior nping measurements rather than relying on discovery-phase learning.

Host Group Sizing and Parallel Phase Interaction

Nmap organizes targets into host groups processed through parallel scan phases. Default group sizing follows this progression: starts at 5 hosts, grows to 1024 for large target sets. Manual control overrides this:

# Optimize for homogeneous, low-latency datacenter segment
nmap --min-hostgroup 256 --max-hostgroup 512 -T4 10.0.0.0/20

# Serialize for heterogeneous WAN with mixed latency
nmap --min-hostgroup 1 --max-hostgroup 8 -T3 --max-rtt-timeout 2s targets.txt

Group sizing interacts critically with scan phases:

| Phase | Group Behavior | Tuning Impact | |-------|--------------|---------------| | Host discovery (ping) | Parallel ICMP/TCP/ACK across group | Large groups saturate uplink; small groups underutilize bandwidth | | Port scanning | Per-host state machines, group-wide timing | Oversized groups delay report generation; undersized groups miss parallelism | | OS detection/version probe | Sequential fallback, group-buffered results | Fixed group sizing here can cause memory pressure with -O and -sV combined | | Traceroute | Individual execution, minimal group effect | Negligible performance impact |

The scan engine interleaves phases across groups rather than completing all phases per group sequentially. This pipeline parallelism means --max-hostgroup constraints during port scanning propagate to version detection scheduling even if version probes themselves would benefit from different grouping.

RTT Timeout Strategies: The Static-Dynamic Tradeoff

Nmap's default dynamic timeout calculation uses:

timeout = srtt + (rttvar × 4)

This mirrors TCP's RTO calculation but with a larger variance multiplier, appropriate for scan traffic's higher loss characteristics.

Static timeouts (--initial-rtt-timeout, --max-rtt-timeout) are necessary when:

Target infrastructure applies rate-limiting with fixed windows
Satellite or cellular paths exhibit bimodal latency distributions
Scanning through Tor or proxy chains with artificial latency floors

The danger of excessive static values appears in this duration estimate for a full TCP SYN scan:

Estimated Duration ≈ (hosts × ports × timeout × retry_factor) / parallel_probes

Where retry_factor = 1 + (retry_probability × max_retries)

For 65,536 hosts, 1000 ports, 2s timeout, 10% loss, 3 retries:

Dynamic (avg 150ms effective): ~36 hours
Static 5s timeout: ~379 hours (16 days)

Recommended approach: Seed with --initial-rtt-timeout 500ms --max-rtt-timeout 2s, allowing downward adaptation but capping upward explosion.

Packet Rate Control and Network Interaction

--max-rate and --min-rate operate at the global packet injection level, across all hosts and phases:

# Maintain 1000 packets/second regardless of responsiveness
nmap --min-rate 1000 --max-rate 1000 -p- target

# Cap to avoid upstream QoS policing at 10 Mbps for 64-byte probes
# 10,000,000 / (64 × 8) ≈ 19,531 packets/sec theoretical max
# Practical: account for Ethernet overhead, use 15,000
nmap --max-rate 15000 10.0.0.0/16

Interaction with network elements:

| Control | QoS Effect | IDS Evasion | |---------|-----------|-------------| | --max-rate | Prevents queue drop at policed rates | Creates predictable pattern (negative) | | --min-rate | Overrides congestion signals, causes loss | Forces constant activity baseline | | Combined fixed rate | Neutral to QoS, detectable | Trivial temporal signature |

For IDS threshold evasion, variable rate limiting through external traffic shaping (Linux tc, BSD dummynet) outperforms Nmap's native controls. The --max-rate parameter should be treated as a damage limitation tool, not an evasion mechanism.

Defeating Rate Limits on Heavily Filtered Networks

Firewalls and OS stacks increasingly implement response rate limiting. Linux's net.ipv4.icmp_ratelimit and similar controls generate ambiguous silence indistinguishable from filtered ports. Nmap provides two override switches:

| Switch | Target Mechanism | Consequence | |--------|-----------------|-------------| | --defeat-rst-ratelimit | OS TCP stack RST generation | Assumes unresponded SYN = open/filtered, not closed | | --defeat-icmp-ratelimit | Firewall ICMP unreachable generation | Treats ICMP rate-limit silence as open/filtered |

These are accuracy-reduction flags, not performance optimizations. Use case: scanning through a Palo Alto firewall with default 100 ICMP/second rate limit to 10,000 hosts. Without the flag, 99% of ports appear filtered due to dropped ICMP unreachables; with it, results upgrade to open|filtered, requiring follow-up with application-layer probes.

Command example for rate-limited datacenter egress:

nmap -sS -Pn -p- --defeat-rst-ratelimit --defeat-icmp-ratelimit \
     --max-rate 5000 --max-retries 2 \
     --min-hostgroup 64 --max-hostgroup 256 \
     -T4 -oA mass_scan_dc1 10.64.0.0/14

The Max-Retries Balancing Act

Retransmission strategy presents a bimodal optimization problem:

Accuracy Cost(low retries) = open_port_loss_rate × target_value
Time Cost(high retries) = filtered_targets × ports × timeout × sum(retry_delays)

For a /8 network (16.7M hosts) with 5% actual host density and 2% open port rate per responsive host:

| --max-retries | Time to Complete | Open Ports Missed (1% loss) | |-----------------|------------------|----------------------------| | 0 | 4.2 hours | 50,400 | | 2 | 12.6 hours | 5,040 | | 6 | 29.4 hours | 504 | | 10 | 46.2 hours | 50 |

The business-relevant formula for retry selection:

optimal_retries = argmin_retries[ 
    (false_negative_cost × missed_opportunities(retries)) + 
    (time_value × duration(retries))
]

For vulnerability management with 4-hour scan windows and $50K/hour incident cost: --max-retries 2. For compliance baseline with 72-hour windows and $2M per missed critical: --max-retries 6 with --max-rtt-timeout 3s.

/8 Network Scanning: Engineering for Scale

Scanning 16,777,216 addresses requires architectural decisions beyond Nmap parameters:

Memory per host (SYN scan, default): ~220 bytes
/8 memory at full group sizing: 3.7 GB
Recommended: shard into /16 chunks with external orchestration

Practical /8 execution:

# Phase 1: Host discovery with minimal state
nmap -sn -T4 --min-parallelism 100 --max-parallelism 500 \
     --max-rtt-timeout 2s --max-retries 1 \
     --max-rate 100000 -oG alive.gnmap 10.0.0.0/8

# Phase 2: Extract responsive hosts, port scan with full accuracy
grep "Up" alive.gnmap | awk '{print $2}' > alive.txt
nmap -sS -sV -O -p- -T4 --max-retries 3 \
     --max-rtt-timeout 5s --min-hostgroup 64 \
     -iL alive.txt -oA full_detail

This two-phase approach reduces state memory by 95%+ and avoids the catastrophic timeout accumulation from probing 15.9M non-responsive addresses.

Duration Estimation Formulas

For planning scan windows:

SYN Scan Duration (seconds) ≈ 
    (N_hosts × N_ports × T_rtt) / (P_parallel × G_groups) 
    + N_hosts × T_overhead_per_host

Where:
    P_parallel = probes_in_flight_per_host (template-dependent)
    G_groups = active_host_groups (capped by --max-hostgroup)
    T_overhead = DNS, ARP, report generation ≈ 50-200ms/host

For UDP scanning, multiply by retry_factor × 2 (higher default timeout) and divide P_parallel by 3-5 (slower kernel rate limiting on unprivileged UDP sockets).

The complete performance engineer's checklist:

Characterize path RTT distribution with nping --tcp -p 80 --count 1000
Set --initial-rtt-timeout to P90 of measured distribution
Size --max-hostgroup to match target network homogeneity
Apply --max-rate at 80% of observed non-drop throughput
Select --max-retries based on false-negative cost model
For /16+, implement two-phase discovery before deep scanning