Performance Tuning, Timing, and Large-Scale Scanning
Timing Templates: Beyond "Just Use -T4"
Nmap's six timing templates (-T0 through -T5) are often treated as a linear speed slider, but each modifies four independent variables with non-linear effects. Understanding the mechanics behind these presets is essential for performance engineering.
| Template | RTT Timeout | Parallel Probes | Scan Delay | Max Retries | |----------|-------------|-----------------|------------|-------------| | -T0 (Paranoid) | 5 min | 1 at a time | 5 min | 20 | | -T1 (Sneaky) | 15 sec | 1 at a time | 15 sec | 20 | | -T2 (Polite) | 1 sec | 1 at a time | 400 ms | 10 | | -T3 (Normal) | Dynamic | Dynamic | 0 | 10 | | -T4 (Aggressive) | Dynamic | Dynamic | 0 | 6 | | -T5 (Insane) | 75 ms | Dynamic | 0 | 2 |
The critical distinction lies in -T0 through -T2 enforcing serial probe transmission (one outstanding probe per host), while -T3 through -T5 enable parallelism. At -T4 and -T5, Nmap's adaptive algorithms activate aggressively, but -T5 caps initial RTT timeout at 75 ms regardless of measured network conditions—a recipe for massive retransmission on latent paths.
What each variable actually controls:
- RTT timeout: Maximum wait for probe response before retransmission. Dynamic calculation uses exponential smoothing with timeouts derived from observed round-trip times.
- Parallel probes: Outstanding probes per host, per scan phase. Higher values exploit network pipelining but increase state table pressure on the scanning host.
- Scan delay: Explicit
usleep()between probe batches. Defeats pattern recognition in IDS but linearly caps throughput. - Max retries: Retransmission ceiling. Each retry doubles effective timeout exposure for that port.
Adaptive Timing Algorithms: Nmap's Congestion Window
Nmap implements a congestion window-like mechanism for probe management, conceptually borrowed from TCP but adapted for stateless and stateful scan types. The engine maintains:
current_rate = min(slow_start_ceiling, network_capacity_estimate, user_max_rate)
The algorithm responds to three congestion signals:
- Response arrival rate: Positive ACKs increase the probe window multiplicatively (slow start) then additively (congestion avoidance)
- Timeout accumulation: Missing responses trigger exponential backoff of per-host timeout values
- ICMP source quench/admin prohibited: Explicit network feedback reduces rate by 50% and logs the event
For SYN scans, Nmap tracks a separate timing state per host group, not per individual target. This aggregation enables efficient parallelization but means a single slow responder in a group throttles the entire batch. The nmamng.h source reveals the core data structure: struct timeout_info stores srtt (smoothed RTT), rttvar (variance), and timeout (calculated retransmission threshold).
The critical insight: Nmap's adaptation is reactive, not predictive. It cannot anticipate network topology changes or time-of-day congestion patterns. For scheduled scans across variable paths, pre-seed with --initial-rtt-timeout based on prior nping measurements rather than relying on discovery-phase learning.
Host Group Sizing and Parallel Phase Interaction
Nmap organizes targets into host groups processed through parallel scan phases. Default group sizing follows this progression: starts at 5 hosts, grows to 1024 for large target sets. Manual control overrides this:
# Optimize for homogeneous, low-latency datacenter segment
nmap --min-hostgroup 256 --max-hostgroup 512 -T4 10.0.0.0/20
# Serialize for heterogeneous WAN with mixed latency
nmap --min-hostgroup 1 --max-hostgroup 8 -T3 --max-rtt-timeout 2s targets.txt
Group sizing interacts critically with scan phases:
| Phase | Group Behavior | Tuning Impact | |-------|--------------|---------------| | Host discovery (ping) | Parallel ICMP/TCP/ACK across group | Large groups saturate uplink; small groups underutilize bandwidth | | Port scanning | Per-host state machines, group-wide timing | Oversized groups delay report generation; undersized groups miss parallelism | | OS detection/version probe | Sequential fallback, group-buffered results | Fixed group sizing here can cause memory pressure with -O and -sV combined | | Traceroute | Individual execution, minimal group effect | Negligible performance impact |
The scan engine interleaves phases across groups rather than completing all phases per group sequentially. This pipeline parallelism means --max-hostgroup constraints during port scanning propagate to version detection scheduling even if version probes themselves would benefit from different grouping.
RTT Timeout Strategies: The Static-Dynamic Tradeoff
Nmap's default dynamic timeout calculation uses:
timeout = srtt + (rttvar × 4)
This mirrors TCP's RTO calculation but with a larger variance multiplier, appropriate for scan traffic's higher loss characteristics.
Static timeouts (--initial-rtt-timeout, --max-rtt-timeout) are necessary when:
- Target infrastructure applies rate-limiting with fixed windows
- Satellite or cellular paths exhibit bimodal latency distributions
- Scanning through Tor or proxy chains with artificial latency floors
The danger of excessive static values appears in this duration estimate for a full TCP SYN scan:
Estimated Duration ≈ (hosts × ports × timeout × retry_factor) / parallel_probes
Where retry_factor = 1 + (retry_probability × max_retries)
For 65,536 hosts, 1000 ports, 2s timeout, 10% loss, 3 retries:
- Dynamic (avg 150ms effective): ~36 hours
- Static 5s timeout: ~379 hours (16 days)
Recommended approach: Seed with --initial-rtt-timeout 500ms --max-rtt-timeout 2s, allowing downward adaptation but capping upward explosion.
Packet Rate Control and Network Interaction
--max-rate and --min-rate operate at the global packet injection level, across all hosts and phases:
# Maintain 1000 packets/second regardless of responsiveness
nmap --min-rate 1000 --max-rate 1000 -p- target
# Cap to avoid upstream QoS policing at 10 Mbps for 64-byte probes
# 10,000,000 / (64 × 8) ≈ 19,531 packets/sec theoretical max
# Practical: account for Ethernet overhead, use 15,000
nmap --max-rate 15000 10.0.0.0/16
Interaction with network elements:
| Control | QoS Effect | IDS Evasion | |---------|-----------|-------------| | --max-rate | Prevents queue drop at policed rates | Creates predictable pattern (negative) | | --min-rate | Overrides congestion signals, causes loss | Forces constant activity baseline | | Combined fixed rate | Neutral to QoS, detectable | Trivial temporal signature |
For IDS threshold evasion, variable rate limiting through external traffic shaping (Linux tc, BSD dummynet) outperforms Nmap's native controls. The --max-rate parameter should be treated as a damage limitation tool, not an evasion mechanism.
Defeating Rate Limits on Heavily Filtered Networks
Firewalls and OS stacks increasingly implement response rate limiting. Linux's net.ipv4.icmp_ratelimit and similar controls generate ambiguous silence indistinguishable from filtered ports. Nmap provides two override switches:
| Switch | Target Mechanism | Consequence | |--------|-----------------|-------------| | --defeat-rst-ratelimit | OS TCP stack RST generation | Assumes unresponded SYN = open/filtered, not closed | | --defeat-icmp-ratelimit | Firewall ICMP unreachable generation | Treats ICMP rate-limit silence as open/filtered |
These are accuracy-reduction flags, not performance optimizations. Use case: scanning through a Palo Alto firewall with default 100 ICMP/second rate limit to 10,000 hosts. Without the flag, 99% of ports appear filtered due to dropped ICMP unreachables; with it, results upgrade to open|filtered, requiring follow-up with application-layer probes.
Command example for rate-limited datacenter egress:
nmap -sS -Pn -p- --defeat-rst-ratelimit --defeat-icmp-ratelimit \
--max-rate 5000 --max-retries 2 \
--min-hostgroup 64 --max-hostgroup 256 \
-T4 -oA mass_scan_dc1 10.64.0.0/14
The Max-Retries Balancing Act
Retransmission strategy presents a bimodal optimization problem:
Accuracy Cost(low retries) = open_port_loss_rate × target_value
Time Cost(high retries) = filtered_targets × ports × timeout × sum(retry_delays)
For a /8 network (16.7M hosts) with 5% actual host density and 2% open port rate per responsive host:
| --max-retries | Time to Complete | Open Ports Missed (1% loss) | |-----------------|------------------|----------------------------| | 0 | 4.2 hours | 50,400 | | 2 | 12.6 hours | 5,040 | | 6 | 29.4 hours | 504 | | 10 | 46.2 hours | 50 |
The business-relevant formula for retry selection:
optimal_retries = argmin_retries[
(false_negative_cost × missed_opportunities(retries)) +
(time_value × duration(retries))
]
For vulnerability management with 4-hour scan windows and $50K/hour incident cost: --max-retries 2. For compliance baseline with 72-hour windows and $2M per missed critical: --max-retries 6 with --max-rtt-timeout 3s.
/8 Network Scanning: Engineering for Scale
Scanning 16,777,216 addresses requires architectural decisions beyond Nmap parameters:
Memory per host (SYN scan, default): ~220 bytes
/8 memory at full group sizing: 3.7 GB
Recommended: shard into /16 chunks with external orchestration
Practical /8 execution:
# Phase 1: Host discovery with minimal state
nmap -sn -T4 --min-parallelism 100 --max-parallelism 500 \
--max-rtt-timeout 2s --max-retries 1 \
--max-rate 100000 -oG alive.gnmap 10.0.0.0/8
# Phase 2: Extract responsive hosts, port scan with full accuracy
grep "Up" alive.gnmap | awk '{print $2}' > alive.txt
nmap -sS -sV -O -p- -T4 --max-retries 3 \
--max-rtt-timeout 5s --min-hostgroup 64 \
-iL alive.txt -oA full_detail
This two-phase approach reduces state memory by 95%+ and avoids the catastrophic timeout accumulation from probing 15.9M non-responsive addresses.
Duration Estimation Formulas
For planning scan windows:
SYN Scan Duration (seconds) ≈
(N_hosts × N_ports × T_rtt) / (P_parallel × G_groups)
+ N_hosts × T_overhead_per_host
Where:
P_parallel = probes_in_flight_per_host (template-dependent)
G_groups = active_host_groups (capped by --max-hostgroup)
T_overhead = DNS, ARP, report generation ≈ 50-200ms/host
For UDP scanning, multiply by retry_factor × 2 (higher default timeout) and divide P_parallel by 3-5 (slower kernel rate limiting on unprivileged UDP sockets).
The complete performance engineer's checklist:
- Characterize path RTT distribution with
nping --tcp -p 80 --count 1000 - Set
--initial-rtt-timeoutto P90 of measured distribution - Size
--max-hostgroupto match target network homogeneity - Apply
--max-rateat 80% of observed non-drop throughput - Select
--max-retriesbased on false-negative cost model - For /16+, implement two-phase discovery before deep scanning