Performance Tuning, Timing, and Large-Scale Scanning

Timing Templates: Beyond "Just Use -T4"

Nmap's six timing templates (-T0 through -T5) are often treated as a linear speed slider, but each modifies four independent variables with non-linear effects. Understanding the mechanics behind these presets is essential for performance engineering.

| Template | RTT Timeout | Parallel Probes | Scan Delay | Max Retries | |----------|-------------|-----------------|------------|-------------| | -T0 (Paranoid) | 5 min | 1 at a time | 5 min | 20 | | -T1 (Sneaky) | 15 sec | 1 at a time | 15 sec | 20 | | -T2 (Polite) | 1 sec | 1 at a time | 400 ms | 10 | | -T3 (Normal) | Dynamic | Dynamic | 0 | 10 | | -T4 (Aggressive) | Dynamic | Dynamic | 0 | 6 | | -T5 (Insane) | 75 ms | Dynamic | 0 | 2 |

The critical distinction lies in -T0 through -T2 enforcing serial probe transmission (one outstanding probe per host), while -T3 through -T5 enable parallelism. At -T4 and -T5, Nmap's adaptive algorithms activate aggressively, but -T5 caps initial RTT timeout at 75 ms regardless of measured network conditions—a recipe for massive retransmission on latent paths.

What each variable actually controls:

  • RTT timeout: Maximum wait for probe response before retransmission. Dynamic calculation uses exponential smoothing with timeouts derived from observed round-trip times.
  • Parallel probes: Outstanding probes per host, per scan phase. Higher values exploit network pipelining but increase state table pressure on the scanning host.
  • Scan delay: Explicit usleep() between probe batches. Defeats pattern recognition in IDS but linearly caps throughput.
  • Max retries: Retransmission ceiling. Each retry doubles effective timeout exposure for that port.

Adaptive Timing Algorithms: Nmap's Congestion Window

Nmap implements a congestion window-like mechanism for probe management, conceptually borrowed from TCP but adapted for stateless and stateful scan types. The engine maintains:

current_rate = min(slow_start_ceiling, network_capacity_estimate, user_max_rate)

The algorithm responds to three congestion signals:

  1. Response arrival rate: Positive ACKs increase the probe window multiplicatively (slow start) then additively (congestion avoidance)
  2. Timeout accumulation: Missing responses trigger exponential backoff of per-host timeout values
  3. ICMP source quench/admin prohibited: Explicit network feedback reduces rate by 50% and logs the event

For SYN scans, Nmap tracks a separate timing state per host group, not per individual target. This aggregation enables efficient parallelization but means a single slow responder in a group throttles the entire batch. The nmamng.h source reveals the core data structure: struct timeout_info stores srtt (smoothed RTT), rttvar (variance), and timeout (calculated retransmission threshold).

The critical insight: Nmap's adaptation is reactive, not predictive. It cannot anticipate network topology changes or time-of-day congestion patterns. For scheduled scans across variable paths, pre-seed with --initial-rtt-timeout based on prior nping measurements rather than relying on discovery-phase learning.

Host Group Sizing and Parallel Phase Interaction

Nmap organizes targets into host groups processed through parallel scan phases. Default group sizing follows this progression: starts at 5 hosts, grows to 1024 for large target sets. Manual control overrides this:

# Optimize for homogeneous, low-latency datacenter segment
nmap --min-hostgroup 256 --max-hostgroup 512 -T4 10.0.0.0/20

# Serialize for heterogeneous WAN with mixed latency
nmap --min-hostgroup 1 --max-hostgroup 8 -T3 --max-rtt-timeout 2s targets.txt

Group sizing interacts critically with scan phases:

| Phase | Group Behavior | Tuning Impact | |-------|--------------|---------------| | Host discovery (ping) | Parallel ICMP/TCP/ACK across group | Large groups saturate uplink; small groups underutilize bandwidth | | Port scanning | Per-host state machines, group-wide timing | Oversized groups delay report generation; undersized groups miss parallelism | | OS detection/version probe | Sequential fallback, group-buffered results | Fixed group sizing here can cause memory pressure with -O and -sV combined | | Traceroute | Individual execution, minimal group effect | Negligible performance impact |

The scan engine interleaves phases across groups rather than completing all phases per group sequentially. This pipeline parallelism means --max-hostgroup constraints during port scanning propagate to version detection scheduling even if version probes themselves would benefit from different grouping.

RTT Timeout Strategies: The Static-Dynamic Tradeoff

Nmap's default dynamic timeout calculation uses:

timeout = srtt + (rttvar × 4)

This mirrors TCP's RTO calculation but with a larger variance multiplier, appropriate for scan traffic's higher loss characteristics.

Static timeouts (--initial-rtt-timeout, --max-rtt-timeout) are necessary when:

  • Target infrastructure applies rate-limiting with fixed windows
  • Satellite or cellular paths exhibit bimodal latency distributions
  • Scanning through Tor or proxy chains with artificial latency floors

The danger of excessive static values appears in this duration estimate for a full TCP SYN scan:

Estimated Duration ≈ (hosts × ports × timeout × retry_factor) / parallel_probes

Where retry_factor = 1 + (retry_probability × max_retries)

For 65,536 hosts, 1000 ports, 2s timeout, 10% loss, 3 retries:

  • Dynamic (avg 150ms effective): ~36 hours
  • Static 5s timeout: ~379 hours (16 days)

Recommended approach: Seed with --initial-rtt-timeout 500ms --max-rtt-timeout 2s, allowing downward adaptation but capping upward explosion.

Packet Rate Control and Network Interaction

--max-rate and --min-rate operate at the global packet injection level, across all hosts and phases:

# Maintain 1000 packets/second regardless of responsiveness
nmap --min-rate 1000 --max-rate 1000 -p- target

# Cap to avoid upstream QoS policing at 10 Mbps for 64-byte probes
# 10,000,000 / (64 × 8) ≈ 19,531 packets/sec theoretical max
# Practical: account for Ethernet overhead, use 15,000
nmap --max-rate 15000 10.0.0.0/16

Interaction with network elements:

| Control | QoS Effect | IDS Evasion | |---------|-----------|-------------| | --max-rate | Prevents queue drop at policed rates | Creates predictable pattern (negative) | | --min-rate | Overrides congestion signals, causes loss | Forces constant activity baseline | | Combined fixed rate | Neutral to QoS, detectable | Trivial temporal signature |

For IDS threshold evasion, variable rate limiting through external traffic shaping (Linux tc, BSD dummynet) outperforms Nmap's native controls. The --max-rate parameter should be treated as a damage limitation tool, not an evasion mechanism.

Defeating Rate Limits on Heavily Filtered Networks

Firewalls and OS stacks increasingly implement response rate limiting. Linux's net.ipv4.icmp_ratelimit and similar controls generate ambiguous silence indistinguishable from filtered ports. Nmap provides two override switches:

| Switch | Target Mechanism | Consequence | |--------|-----------------|-------------| | --defeat-rst-ratelimit | OS TCP stack RST generation | Assumes unresponded SYN = open/filtered, not closed | | --defeat-icmp-ratelimit | Firewall ICMP unreachable generation | Treats ICMP rate-limit silence as open/filtered |

These are accuracy-reduction flags, not performance optimizations. Use case: scanning through a Palo Alto firewall with default 100 ICMP/second rate limit to 10,000 hosts. Without the flag, 99% of ports appear filtered due to dropped ICMP unreachables; with it, results upgrade to open|filtered, requiring follow-up with application-layer probes.

Command example for rate-limited datacenter egress:

nmap -sS -Pn -p- --defeat-rst-ratelimit --defeat-icmp-ratelimit \
     --max-rate 5000 --max-retries 2 \
     --min-hostgroup 64 --max-hostgroup 256 \
     -T4 -oA mass_scan_dc1 10.64.0.0/14

The Max-Retries Balancing Act

Retransmission strategy presents a bimodal optimization problem:

Accuracy Cost(low retries) = open_port_loss_rate × target_value
Time Cost(high retries) = filtered_targets × ports × timeout × sum(retry_delays)

For a /8 network (16.7M hosts) with 5% actual host density and 2% open port rate per responsive host:

| --max-retries | Time to Complete | Open Ports Missed (1% loss) | |-----------------|------------------|----------------------------| | 0 | 4.2 hours | 50,400 | | 2 | 12.6 hours | 5,040 | | 6 | 29.4 hours | 504 | | 10 | 46.2 hours | 50 |

The business-relevant formula for retry selection:

optimal_retries = argmin_retries[ 
    (false_negative_cost × missed_opportunities(retries)) + 
    (time_value × duration(retries))
]

For vulnerability management with 4-hour scan windows and $50K/hour incident cost: --max-retries 2. For compliance baseline with 72-hour windows and $2M per missed critical: --max-retries 6 with --max-rtt-timeout 3s.

/8 Network Scanning: Engineering for Scale

Scanning 16,777,216 addresses requires architectural decisions beyond Nmap parameters:

Memory per host (SYN scan, default): ~220 bytes
/8 memory at full group sizing: 3.7 GB
Recommended: shard into /16 chunks with external orchestration

Practical /8 execution:

# Phase 1: Host discovery with minimal state
nmap -sn -T4 --min-parallelism 100 --max-parallelism 500 \
     --max-rtt-timeout 2s --max-retries 1 \
     --max-rate 100000 -oG alive.gnmap 10.0.0.0/8

# Phase 2: Extract responsive hosts, port scan with full accuracy
grep "Up" alive.gnmap | awk '{print $2}' > alive.txt
nmap -sS -sV -O -p- -T4 --max-retries 3 \
     --max-rtt-timeout 5s --min-hostgroup 64 \
     -iL alive.txt -oA full_detail

This two-phase approach reduces state memory by 95%+ and avoids the catastrophic timeout accumulation from probing 15.9M non-responsive addresses.

Duration Estimation Formulas

For planning scan windows:

SYN Scan Duration (seconds) ≈ 
    (N_hosts × N_ports × T_rtt) / (P_parallel × G_groups) 
    + N_hosts × T_overhead_per_host

Where:
    P_parallel = probes_in_flight_per_host (template-dependent)
    G_groups = active_host_groups (capped by --max-hostgroup)
    T_overhead = DNS, ARP, report generation ≈ 50-200ms/host

For UDP scanning, multiply by retry_factor × 2 (higher default timeout) and divide P_parallel by 3-5 (slower kernel rate limiting on unprivileged UDP sockets).

The complete performance engineer's checklist:

  1. Characterize path RTT distribution with nping --tcp -p 80 --count 1000
  2. Set --initial-rtt-timeout to P90 of measured distribution
  3. Size --max-hostgroup to match target network homogeneity
  4. Apply --max-rate at 80% of observed non-drop throughput
  5. Select --max-retries based on false-negative cost model
  6. For /16+, implement two-phase discovery before deep scanning