RouteHardenHire us
Back to Networking Fundamentals
Networking Fundamentals · Part 12 of 12·Network Hardening··16 min read·intermediate

NAT, NAT traversal, and the end-to-end principle

Why NAT exists, how mapping/filtering/timeouts actually behave, what STUN/TURN/ICE are for, and why CGNAT compounds the problem IPv6 was supposed to fix.

Network Address Translation was supposed to be a temporary workaround for IPv4 address scarcity. It is now a load-bearing feature of the public internet, embedded in every home router, mobile carrier network, and consumer ISP. Whole categories of protocol work — STUN, TURN, ICE, hole-punching, relay services — exist solely to recover the end-to-end connectivity that NAT broke. This module is the foundational pass through that landscape: how NAT actually rewrites packets, what its real-world behavior categories are, why STUN/TURN/ICE are necessary, what CGNAT is doing to make things worse, and where IPv6 sits in this story as the architectural way out we still haven't fully taken.

Prerequisites

Learning objectives

By the end of this module you should be able to:

  1. Explain how NAT44/NAPT rewrites endpoint identity and why that violates the original end-to-end model.
  2. Distinguish mapping behavior, filtering behavior, and timeout behavior in real NAT implementations, using RFC 4787 vocabulary instead of the older "cone vs symmetric" folklore.
  3. Explain how STUN, TURN, and ICE work together to recover connectivity across NATs.
  4. Compare NAT44, CGNAT, NAT64, and pure IPv6 deployment in terms of operational tradeoffs.
  5. Inspect a NAT mapping by comparing local and public addresses, and recognize ICE candidate types in a real WebRTC connection.

The end-to-end principle and what NAT changed

The original internet design — articulated in Saltzer, Reed, and Clark's 1984 paper "End-to-End Arguments in System Design" — held that intelligence belonged at the endpoints and the network should be a dumb forwarder. Two consequences:

  • Every host has a globally-unique address. Anyone can address any other host directly.
  • Routers don't keep per-flow state. They look up packets by destination and forward; they don't track which connection a packet belongs to.

This model gave the internet two properties that proved enormously valuable: any-to-any reachability, and the ability for new protocols to be deployed without coordinating with the network. Innovation could happen at the edges.

NAT broke both. A NAT box must keep per-flow state so it knows where to send return traffic. Hosts behind the NAT lose their globally-unique address — they share one or a few public addresses among many private endpoints. New inbound connections to those hosts are impossible without explicit configuration.

Why did NAT win anyway? Two simultaneous pressures:

  • IPv4 address scarcity. The 32-bit IPv4 address space couldn't accommodate the explosion of internet-connected devices in the 2000s. NAT let one public address represent thousands of internal hosts.
  • Implicit firewall behavior. A side effect of NAT — that unsolicited inbound traffic has nowhere to go — was treated as a security feature. "It's behind NAT" became a casual synonym for "it's not directly exposed."

Both pressures are still there. Neither is gone. The architectural fix (IPv6) addresses scarcity but not the firewall expectation, and operators are slow to redeploy without a forcing function. So NAT remains, and so does the cottage industry of protocols designed to work around it.

Basic NAT vs NAPT

Two flavors of NAT, often confused:

Basic NAT — also called 1:1 NAT — translates one public IP to one private IP. It's used in cases where you want to expose a server with a public address that's different from its internal address (network re-addressing, multi-homed setups). It doesn't multiplex multiple internal hosts onto one public address.

Network Address Port Translation (NAPT) — also called many-to-one NAT — translates many private endpoints onto one public address by also rewriting the source port. This is what almost everyone means when they say "NAT" in a home or enterprise context.

Concrete example. A laptop with private IP 10.0.0.5 and ephemeral port 54321 makes a connection to 93.184.216.34:443:

src: 10.0.0.5:54321  →  dst: 93.184.216.34:443

The NAT (with public IP 203.0.113.1) rewrites the source to its public IP and an arbitrary outbound port:

src: 203.0.113.1:62000  →  dst: 93.184.216.34:443

The NAT records this mapping: 10.0.0.5:54321 ↔ 203.0.113.1:62000. When return traffic arrives at 203.0.113.1:62000, the NAT looks up the mapping and rewrites the destination back to 10.0.0.5:54321 before forwarding internally.

Different internal hosts — say 10.0.0.5:54321 and 10.0.0.6:54321 — both translate to different external ports because the NAT picks ephemeral ports per mapping. Tens of thousands of simultaneous flows can share one public IP this way.

This works as long as:

  • The NAT correctly maps return traffic back to the originator.
  • The NAT keeps the mapping alive long enough for the application's request/response cycle.
  • The application doesn't put the unrewritten internal address into its own protocol payload (some protocols do — FTP being the classic offender — and those need application-aware proxies to fix up).

Mapping, filtering, and timeouts

Three NAT properties that determine what protocols can survive:

Mapping behavior. When a single internal endpoint sends to two different remote endpoints, does the NAT use the same external port for both, or different ports for each?

  • Endpoint-Independent Mapping. Same external port regardless of remote. This is what STUN-based traversal needs.
  • Address-Dependent Mapping. Different external port per remote IP.
  • Address and Port-Dependent Mapping. Different external port per remote IP+port. This is the most restrictive case; STUN-based traversal often fails.

Filtering behavior. When inbound traffic arrives, does the NAT permit it based purely on its outgoing mapping, or also based on the source?

  • Endpoint-Independent Filtering. Any external host can send to a mapping the internal host opened. Most permissive.
  • Address-Dependent Filtering. Only external hosts the internal host has previously sent to can send back.
  • Address and Port-Dependent Filtering. Only the exact external endpoint that received outbound traffic can reply. Most restrictive.

Timeout behavior. A mapping that hasn't seen traffic in a while gets evicted. Common values: 30 seconds for new UDP flows, 5 minutes for "established" UDP flows, and 5 days for established TCP flows. Vendors vary widely. The 30-second UDP timeout is the source of every "my long-running UDP connection died at exactly 30 seconds" bug.

The combinatorial mess of mapping × filtering × timeout produces sixteen-plus distinct NAT behaviors. Old folklore reduced this to "cone NAT" vs "symmetric NAT" but the categorization is too coarse for serious diagnosis. RFC 4787 introduced the behavior-oriented vocabulary above; that's what production-quality traversal libraries use.

The key operational lesson: NATs are stateful machines with expiry, not stateless rewrite boxes. Anything you build assuming "the mapping persists forever" will eventually break.

Why applications break

Several distinct failure modes show up in real applications:

Inbound reachability disappears. A server behind NAT cannot accept connections initiated from outside. The NAT has no mapping for unsolicited inbound traffic, so it drops the SYN. Workarounds: explicit port forwarding (configure the NAT to map a public port to a specific internal host), UPnP/NAT-PMP (let the application negotiate the mapping at runtime), or relays that the internal host connects out to.

Peer-to-peer becomes hard. Two hosts both behind NAT can't connect to each other directly because each is unreachable from outside. P2P protocols use STUN/TURN/ICE (see below) to discover and establish connectivity through a coordination server.

Protocols that embed addresses in payloads break. SIP, FTP active mode, and a few others put the source's IP and port into the application protocol body. NAT rewrites the IP layer but doesn't know to look inside the payload. Application-Level Gateways (ALGs) inside NATs were built to fix this for common protocols; they introduce their own bugs and don't handle protocols the ALG doesn't know about.

Long-idle flows die quietly. A UDP-based protocol with sparse traffic — periodic keepalives every 60 seconds, say — is at risk of NAT timeout if any NAT on the path uses 30-second UDP timeouts. The application sends; the packet hits a closed NAT entry and is dropped. The application waits for response. Until the application sends again, it doesn't realize the path is broken.

TCP keepalives may not fire fast enough. Linux's default TCP keepalive interval is 7,200 seconds (2 hours). A NAT timeout of 5 days is plenty. A NAT timeout of 1 hour is not, and the connection silently dies. Application-layer pings are the answer for any protocol traversing untrusted NATs.

STUN: learn the public reflexive address

STUN (Session Traversal Utilities for NAT) is the simplest tool. A client behind NAT sends a STUN request to a public STUN server. The server replies with the source IP and port the request appeared to come from — which is the NAT's external mapping for that client.

client (10.0.0.5:54321)
  → NAT
  → STUN server  (sees: 203.0.113.1:62000)
  ← reply: "your reflexive address is 203.0.113.1:62000"

The client now knows its external mapping for this particular STUN flow. It can include that address in subsequent signaling so a peer can attempt a direct connection to it.

STUN works alone for traversal only when the NAT has Endpoint-Independent Mapping (the same external port is used for any remote, including a P2P peer who isn't the STUN server). When the NAT is more restrictive — Address-Dependent or Address-and-Port-Dependent — the mapping the client learned with the STUN server doesn't apply to a different remote, so direct connectivity fails.

STUN is a tool, not a complete solution. Most real traversal pipelines use STUN to gather reflexive candidates and combine them with other techniques.

TURN: relay when direct fails

TURN (Traversal Using Relays around NAT) is the fallback for cases where direct peer-to-peer connectivity isn't possible. A TURN server sits in the public internet and relays bidirectional traffic between two peers, both of whom can reach the TURN server (because they initiate outbound connections to it).

Peer A → NAT-A → TURN server ← NAT-B ← Peer B

Each peer establishes an outbound connection to the TURN server. The server relays bytes between the two. From the network's perspective, the connection is just two clients talking to a public service; from the application's perspective, A and B are connected.

The cost: every byte traverses the relay, doubling bandwidth use and adding RTT. Relay-only deployments are bandwidth-expensive at scale. Real systems try direct connectivity first and fall back to TURN only when needed.

A privacy nuance: TURN sees all traffic between the peers (encrypted at higher layers, but the TURN server knows that A and B are talking and how much). Some operators run their own TURN servers specifically to control this.

ICE: try every candidate

ICE (Interactive Connectivity Establishment) is the framework that puts STUN and TURN together. Each peer:

  1. Gathers candidates: addresses where it might be reachable.
    • Host candidate — a local IP/port (the actual interface address).
    • Server-reflexive candidate — what STUN told it the NAT mapping looks like.
    • Relayed candidate — a TURN-server-relay address.
  2. Sends its full candidate list to the peer via signaling (e.g., a WebSocket through a coordination server).
  3. Connectivity-checks every pair of candidates: tries STUN binding requests from each of its own candidates to each of the peer's candidates.
  4. Picks the best working pair, where "best" prefers host candidates over reflexive over relayed.

A successful direct host-to-host connection is the ideal. A reflexive-to-reflexive connection works for compatible NAT types. A relay-to-relay connection always works but costs the most.

ICE's complexity is real but unavoidable. There are too many NAT behaviors and too many corner cases for any simpler approach to be reliable. WebRTC, Tailscale's mesh connectivity, and most modern P2P apps all use ICE or close variants.

CGNAT: NAT inside the ISP

Carrier-Grade NAT (CGNAT) is what happens when an ISP runs out of public IPv4 addresses to assign to customers. The ISP gives each customer a private address from the RFC 6598 100.64.0.0/10 range, then NAPTs the entire customer base onto a small pool of real public IPs.

The customer's home router still NATs their LAN onto the assigned 100.64.x.x address. So traffic from a device behind the home router goes through two NATs:

Device (10.0.0.5) → home NAT → 100.64.x.x → carrier NAT → 203.0.113.x

Two layers of NAT compound every problem of one. Mapping behaviors interact unpredictably. Timeouts get the minimum of the two. Inbound reachability is impossible without coordination from both NATs (which doesn't exist for residential customers). Hole-punching gets harder. STUN reveals the carrier NAT's mapping, not anything actionable for inbound.

CGNAT is rolling out across mobile carriers and many residential ISPs in 2026. It's a workaround for a workaround. The right fix — IPv6 — coexists with CGNAT on most modern ISPs but is rarely the sole solution.

NAT64 and IPv6 transition

NAT64 is something different: it lets an IPv6-only host reach IPv4-only services. A NAT64 box at the network edge translates IPv6 packets to IPv4, mapping IPv6 destination prefixes to IPv4 addresses on the way out and back.

This is part of the IPv6 transition story. The pure form would be: every host runs IPv6, every server is IPv6-reachable, the IPv4 era ends. The reality is messier — some services are still IPv4-only, some IPv6-only — and NAT64 lets those populations talk.

It is not a replacement for IPv6 deployment. It's a bridge. The end state is everyone-IPv6 and NAT64 boxes idle. Whether we get there — and how long — is a deployment-incentives question, not a technical one.

IPv6 as the architectural fix

IPv6's address space (2^128) ends scarcity. Every device, every interface, every container can have a unique global address forever. NAT becomes optional rather than required. The end-to-end model — every host directly addressable — is restored.

In practice:

  • Most consumer ISPs offer IPv6. Major mobile carriers (T-Mobile, Verizon, Vodafone) deploy IPv6-first.
  • Most major sites have IPv6 AAAA records. Google, Facebook, Cloudflare, Apple all serve IPv6 transparently.
  • Most home routers support IPv6. Often as a parallel stack alongside IPv4 NAT.

But:

  • Many enterprise networks remain IPv4-only. Years of accumulated tooling, policy, and operational habit.
  • Some applications break under IPv6. Bugs that nobody noticed under IPv4-only deployment surface when both stacks are active.
  • NAT's "implicit firewall" reputation persists. Operators who treated NAT as security feature are reluctant to lose it, even though stateful firewalls do the actual work much better.

The honest deployment picture for routeharden's audience in 2026: dual stack on consumer networks, IPv4-only on most enterprise networks, and an unending parade of NAT-traversal pain for any application that needs peer-to-peer connectivity.

Hands-on exercise

Exercise 1 — Compare local and public addresses

# Your local addresses
ip addr show | grep -oE 'inet [0-9.]+'

# Your public address as seen by the internet
curl -s https://ifconfig.co

Compare them. If they differ, you're behind a NAT. The most common pattern: local address in 192.168.x.x or 10.x.x.x (RFC 1918 private space), public address in some real-world IP your ISP assigned.

The mapping between them is the NAT's state. You can't see the NAT's mapping table directly (it's inside your router), but you can verify that something between you and ifconfig.co is rewriting your source address.

Stretch: repeat on a tethered cellular connection. Carriers very often use CGNAT for cellular data, which means your "public" address from ifconfig.co is actually a 100.64.x.x carrier-private address that gets NAT'd again upstream.

Exercise 2 — ICE candidates in a WebRTC demo

Open a browser and visit a WebRTC ICE test page (search for "WebRTC ICE candidate test"). Click "gather candidates."

The page will list candidate lines like:

candidate:1 1 udp 2113937151 192.168.1.42 54321 typ host
candidate:2 1 udp 1845501695 203.0.113.5 62000 typ srflx raddr 192.168.1.42 rport 54321
candidate:3 1 udp 33562367 198.51.100.7 30000 typ relay raddr 203.0.113.5 rport 62000

typ host is your local address. typ srflx is your STUN-discovered reflexive address. typ relay is a TURN-relayed address.

If you see only host and srflx candidates, your NAT was permissive enough that direct connections from a peer might work. If you also see relay, the system has a fallback for cases where direct fails.

A WebRTC connection between two such hosts is then ICE checking all pairs of these candidates — host-to-host, host-to-srflx, srflx-to-srflx, and so on — until one combination succeeds. Most home-to-home connections succeed at srflx-to-srflx. CGNAT-to-CGNAT often falls back to relay.

Common misconceptions

"NAT is a firewall." NAT and firewalling are different operations that often live in the same device. NAT translates addresses; firewalling enforces policy. The implicit "no inbound" property of most NAT deployments is a side effect, not the design intent. A real stateful firewall with explicit policy is more robust and clearer than relying on NAT to drop unwanted traffic.

"Private address space caused NAT." Address scarcity caused NAT. RFC 1918 private space made NAT administratively practical — without globally-unique private addresses, you couldn't even talk about "rewriting from private to public." But NAT exists because we ran out of public IPv4, not because we wanted private addressing.

"STUN alone solves NAT traversal." STUN reveals reflexive address mappings, which only enable direct connectivity for some NAT types (Endpoint-Independent Mapping). For more restrictive NATs, you need STUN + ICE + sometimes TURN. STUN is one tool in the box, not the whole thing.

"IPv6 removed every operational problem NAT ever addressed." IPv6 removes scarcity-driven NAT. It doesn't remove the operational habits that grew up around NAT — explicit firewalls, stateful policy, address translation for renumbering convenience. Some IPv6 deployments still use NAT-like translation (NPTv6) for site-local renumbering reasons.

"'Symmetric NAT' is enough vocabulary for debugging." "Symmetric" is too vague. RFC 4787's behavior-oriented terms — Endpoint-Independent / Address-Dependent / Address-and-Port-Dependent for both mapping and filtering — describe what's actually happening with enough precision to diagnose. Use them.

Further reading

  1. RFC 4787 — NAT Behavioral Requirements for Unicast UDP. The authoritative vocabulary for NAT behaviors. Read this before any traversal work.
  2. RFC 8489 — Session Traversal Utilities for NAT (STUN). The STUN protocol.
  3. RFC 8656 — Traversal Using Relays around NAT (TURN). The TURN protocol.
  4. RFC 8445 — Interactive Connectivity Establishment (ICE). The framework that ties STUN and TURN into a connectivity-establishment system.
  5. RFC 6598 — IANA-Reserved IPv4 Prefix for Shared Address Space. The CGNAT range and rationale.
  6. RFC 6146 — Stateful NAT64. The IPv6-to-IPv4 translation mechanism.
  7. Saltzer, Reed, Clark, End-to-End Arguments in System Design, ACM Trans. Comput. Syst., 1984. The architectural argument NAT broke. Worth reading once for historical perspective.

This is the last module of Track 1 (Networking Fundamentals). Track 2 (Cryptography Foundations) picks up by digging into the cryptographic primitives that the protocols above — TLS 1.3, WireGuard, QUIC, SSH — all assume their readers already understand.