The IP forwarding plane
How a router actually forwards a packet: longest-prefix match, FIB lookup, adjacency resolution, TTL/Hop Limit, fragmentation, ICMP feedback, and the data/control/management plane split.
A router is not a magic internet box. Every router on every path your packets traverse is doing the same small thing repeatedly: receive a packet, decide which interface to send it out, decrement a lifetime field, and transmit. The decisions look complicated from the outside because the internet's routing tables are large and the corner cases are well-documented in operational lore. The decisions themselves are not complicated. This module covers what a router actually does on the per-packet fast path, why longest-prefix match exists, how loops are bounded, and the practical debugging toolkit that turns forwarding from "magic" into something you can traceroute your way through.
Prerequisites
- Module 1.3 — IPv4 addressing and subnetting deep dive. The whole forwarding decision is a search over IPv4 prefixes (and IPv6 prefixes, which work the same way at this level).
- Module 1.4 — IPv6 fundamentals. Either IP family is fine for the discussion below; the differences are noted where they matter.
Learning objectives
By the end of this module you should be able to:
- Trace the per-packet forwarding decision a router makes from ingress interface to egress next hop, and identify which steps live in the data plane versus the control plane.
- Apply longest-prefix match to determine which route wins when multiple entries in the routing table cover the same destination.
- Explain TTL / Hop Limit, path MTU discovery, and fragmentation as forwarding-plane concerns rather than application trivia.
- Use
ip route get,traceroute,tracepath, and packet captures to debug forwarding behavior on a real Linux box. - Recognize the most common router-side failure modes — unusual headers on the slow path, blackhole PMTUD, default-route races — well enough to spot them in a war room.
Control plane vs data plane vs management plane
A router does three categories of work, and pulling them apart is the cleanest way to understand what's happening when something goes wrong.
The data plane (also called the forwarding plane) is the per-packet path. A packet arrives, the router looks up its destination, decrements the lifetime field, hands it off to an outbound interface. This must run at line rate — millions of packets per second on a small router, billions per second on a backbone box. It runs on hardware, ASIC pipelines, or carefully-tuned software paths.
The control plane is what computes the routing table. Routing protocols (BGP, OSPF, IS-IS), static configuration, and connected-route discovery all feed entries into the routing information base (RIB). The RIB is then compiled into a forwarding information base (FIB) — a fast lookup structure usable by the data plane on every packet. The control plane runs at human or seconds-scale; the data plane reads its decisions on the packet timescale.
The management plane is everything else: SSH, SNMP, telemetry, the CLI you log into. It's where humans and management systems talk to the router. It typically runs on the router's general-purpose CPU rather than the data-plane hardware, and is the slowest path of the three.
The split matters operationally. A control-plane failure (e.g., a BGP session going down) means routes don't update, but packets matching existing entries keep flowing. A data-plane failure (e.g., a corrupted FIB) means packets stop. A management-plane failure (e.g., SSH dies) means you can't log in but the data plane keeps doing its job. Knowing which plane has failed determines what to fix and how urgently.
The other consequence is that routes can be wrong. If your control plane installs a route to 10.0.0.0/8 via the wrong next-hop, every packet matching that prefix gets sent to the wrong place. The data plane is just executing the lookup; it's the control plane that decides what's in the table. Most "the router is broken" problems are actually "the control plane installed the wrong route" problems.
Receiving a packet — sanity checks before lookup
Before a router forwards anything, it does a small set of sanity checks at ingress. The exact list varies by implementation but conceptually:
- Layer-2 arrival. Check that the destination MAC matches the router's own MAC on this interface (or is broadcast/multicast). If not, drop — the frame wasn't for us.
- Frame integrity. Verify the Ethernet FCS. The NIC has usually already done this and dropped the frame on mismatch; a packet that reaches the IP layer has a good FCS.
- IP header sanity. Version field, header length, and (for IPv4) header checksum are all checked. Malformed headers get dropped here.
- TTL / Hop Limit. If the field is
1or less, the packet is about to expire. Drop and return ICMP Time Exceeded to the source. If greater, decrement by 1. - Source address sanity. Some routers run anti-spoofing checks here (RPF — reverse path forwarding) to drop packets whose source address shouldn't be arriving from this interface. This is a configurable policy, not always on.
After all of those pass, the router is ready to actually look up the destination.
Longest-prefix match
Forwarding looks up the destination address in the FIB. The FIB is a collection of entries, each of the form <prefix> → <next-hop, egress-interface>. The lookup finds the most specific matching prefix — the one with the longest prefix length whose bits agree with the destination address.
That's the whole rule. There is no insertion-order priority, no protocol-source priority for the lookup itself (priorities apply only when building the table; once built, the table is queried purely by length). If the destination is 198.51.100.140 and the table contains:
0.0.0.0/0 via 192.0.2.1 (default route)
198.51.100.0/24 via 192.0.2.2
198.51.100.128/25 via 192.0.2.3
All three match. The default route matches everything (length 0). The /24 matches the first 24 bits of 198.51.100.140. The /25 matches the first 25 bits. Longest-prefix match picks the /25, so the next hop is 192.0.2.3.
Why this rule? It lets you have aggregate routes and more-specific overrides simultaneously. An ISP can advertise 203.0.113.0/22 to BGP peers (the aggregate) while internally routing 203.0.113.0/24 and 203.0.113.128/27 to specific customers. The aggregate covers everyone; the more-specifics steer exceptions. Without longest-prefix match, this would be a contradiction. With it, it's just how the table works.
The default route is the catch-all. 0.0.0.0/0 matches every IPv4 address; ::/0 matches every IPv6 address. It has prefix length zero, so it's always the shortest match in the table. It's used only when nothing else matches — exactly the catch-all role you want.
How a real FIB implements longest-prefix match efficiently is its own discipline. Software FIBs use trie variants — radix trees, multibit tries, level-compressed tries (LC-tries on Linux). Hardware FIBs use TCAM (ternary content-addressable memory), which can do the match in a single clock cycle. Modern routers with millions of routes and 100 Gbps interfaces depend on this hardware acceleration.
Forwarding entries vs adjacency entries
A FIB entry alone is not enough to forward a packet. The entry says "send packets matching this prefix toward next-hop X out interface Y." But to put bytes on the wire, the router needs the layer-2 address of next-hop X. That's a separate lookup — the adjacency table — that resolves a next-hop IP to a destination MAC, populated by ARP (IPv4) or NDP (IPv6) on the local segment.
So a complete forwarding decision is:
- Look up destination IP in the FIB → get next-hop IP and egress interface.
- Look up next-hop IP in the adjacency table for the egress interface → get destination MAC.
- Build the new Ethernet header with that destination MAC and the router's source MAC.
- Decrement TTL, recompute the IP header checksum (IPv4 only — IPv6 doesn't have one), and push the frame out.
If the adjacency lookup misses — the next-hop's MAC isn't cached — the router triggers ARP/NDP and either holds the packet briefly or drops it. Modern routers usually hold a small queue of "punted" packets per next-hop and resume forwarding when ARP/NDP completes. Excessive punts to the slow path under load are a common debugging signal — if they happen frequently to one next-hop, look for cache thrashing or aggressive aging.
The split between routes and adjacencies is also why two routers on the same LAN as your default gateway behave differently for two destinations behind it. The route says "send to gateway 192.0.2.1"; the adjacency resolves the gateway's MAC. If the gateway changes its MAC (NIC swap, failover), the adjacency cache repopulates and forwarding resumes; the route table doesn't change.
TTL and Hop Limit as loop breakers
The IP header carries an 8-bit field — TTL in IPv4, Hop Limit in IPv6 — that every router decrements by 1. When the field reaches zero, the packet is dropped and the router emits an ICMP message (Time Exceeded for IPv4, ICMPv6 type 3 for IPv6) back to the source.
This is the data plane's defense against loops. Loops happen — misconfiguration, transient routing convergence, deliberate redirection. Without TTL, a packet caught in a loop would orbit forever, generating exponentially more traffic with every retransmission. With TTL, every packet has a finite lifetime measured in hops; a loop that survives for TTL_initial hops collapses naturally.
The default initial values:
- IPv4: 64 (Linux), 128 (Windows). Linux's choice is the more common setting on the public internet.
- IPv6: 64 (most stacks).
The internet typically has 10–25 router hops between any two endpoints, so 64 is a comfortable margin. A packet with TTL ≤ 16 at egress has been routed fewer than that many hops; one with TTL ≤ 1 is already at the edge of expiration and shouldn't be passed on.
A common misconception: TTL is not a time field. It used to be — RFC 791 specified TTL in seconds, with each router decrementing by at least 1 even if forwarded faster than 1 second. Modern implementations have abandoned that interpretation entirely. TTL is now exclusively a hop counter, and the standard treats "decrement by 1 per hop" as universal.
The ICMP feedback when TTL expires is also what makes traceroute work, which we'll get to.
Path MTU and fragmentation
Different links along a path can have different MTUs. A packet sized for the source's link might be too big for an intermediate link. There are three ways the network can handle this:
Pre-segment at the transport layer. The sending TCP stack picks an MSS small enough that the resulting IP packet fits the smallest known link MTU. If MSS clamping is in play, an intermediate router rewrites the SYN's MSS option to a smaller value; otherwise the source's PMTUD picks the right size. This is by far the dominant mode in 2026.
Fragment in flight (IPv4 only). If the IPv4 packet has the DF (Don't Fragment) bit clear and is too large for an outbound interface, the IPv4 router fragments it: splits the original packet into multiple smaller packets, each with the same identification number and an offset. The receiver reassembles. IPv4 fragmentation works but has well-known operational problems — fragmented traffic is more expensive to route, fragments are easier to lose, and reassembly state introduces DoS vectors. By 2026, almost no traffic on the public internet is using router-side fragmentation; almost everything sets DF, and PMTUD does the work.
Fragment at the source (IPv6). IPv6 routers don't fragment in flight at all. If the packet is too big, the router drops it and emits ICMPv6 "Packet Too Big" (type 2) back to the source. The source can then reduce its packet size for that destination, either at the application layer or by re-segmenting at the transport layer.
Path MTU Discovery (PMTUD, RFC 8201 for IPv6) is the protocol that makes this work. The source sends large packets, sees what bounces back, caches the resulting per-destination MTU, and uses that ceiling for future traffic.
PMTUD breaks when ICMPv6 "Packet Too Big" or ICMPv4 "Fragmentation Needed" messages can't get back to the source. The most common cause is firewalls along the path that drop ICMP indiscriminately. The result is a black hole: large packets disappear, small packets get through, and the application sees inexplicable connectivity issues that vary by content size. The fix is to allow specific ICMP types — at minimum, the ones PMTUD needs — through any firewall on the forwarding path.
The IPv6 minimum MTU is 1280 bytes. Hosts may always send packets up to 1280 without worrying about path MTU. For larger packets, they must do PMTUD or accept the risk of black holes.
ICMP as forwarding-plane feedback
ICMP is the protocol routers use to tell endpoints things. The messages you should know:
- Time Exceeded (IPv4 type 11; ICMPv6 type 3). TTL/Hop Limit hit zero. Returned to the source. The basis of
traceroute. - Destination Unreachable (IPv4 type 3; ICMPv6 type 1). With sub-codes for "no route to host," "host administratively prohibited," "port unreachable," etc. Tells the source why the packet didn't make it.
- Fragmentation Needed (IPv4 type 3, code 4) / Packet Too Big (ICMPv6 type 2). A large packet was dropped because the path MTU is smaller than the packet. Carries the path MTU in the message body so the source can resize.
- Echo Request / Echo Reply (IPv4 types 8/0; ICMPv6 types 128/129). What
pinguses. - Redirect (IPv4 type 5; ICMPv6 type 137). The router is telling the host "you sent this packet to me, but the optimal next hop for that destination is actually X on this same link." Hosts may update their cache to use the suggested next hop directly. Modern security guidance is to ignore redirects (
net.ipv4.conf.*.accept_redirects=0on Linux) because they're trivial to spoof.
Most of these don't affect everyday traffic — they're feedback for unusual cases. But they're how the network layer admits errors. A network with all ICMP blocked is a network where errors are silent: no PMTUD, no traceroute, no clear failure signals. Don't blanket-block ICMP. Ever.
traceroute works by exploiting Time Exceeded. The implementation:
- Send a packet to the final destination with TTL = 1.
- The first router decrements to 0, drops, returns Time Exceeded. The source learns the first hop's address.
- Send a packet with TTL = 2. Second router does the same.
- Continue, incrementing TTL, until packets reach the destination (which returns either an Echo Reply or a Destination Unreachable, depending on the probe type).
Each line of traceroute output is one router along the path. Hops that don't return ICMP — because they're configured not to, because firewalls drop the response, or because they're a stealth load balancer — show up as * * *. A * * * row is data, not failure: it tells you a hop is there but isn't volunteering its identity.
Fast path, slow path, and what makes a packet "weird"
Routers process most packets on a hardware fast path that handles the common case extremely quickly: standard IPv4 or IPv6 header, valid TTL, FIB hit, adjacency hit. The fast path skips features that aren't enabled and exits to the egress interface in a few clock cycles per packet.
Packets that don't fit the fast path get punted to the slow path: a software handler running on the router's general-purpose CPU. Punts happen when:
- The packet has IP options or IPv6 extension headers the fast path doesn't handle.
- The packet's destination doesn't have a cached adjacency.
- The packet matches a specific FIB entry flagged for software processing.
- The packet triggers ICMP generation (TTL expired, port unreachable, etc.).
- The packet is destined to the router itself (control- or management-plane traffic).
The slow path is orders of magnitude slower than the fast path. A heavily-punted router can melt under traffic the fast path would have shrugged off. This is also why unusual packets — IPv4 with options, IPv6 with weird extension headers, fragmented anything — get poor performance and are sometimes dropped silently as a load-shedding measure. If your application depends on extension headers or strange options, you're betting on the slow path of every router on the way.
The pragmatic deployment guidance: don't rely on extension headers other than IPv6 Fragment for paths that traverse the public internet. Don't use IPv4 options at all. If you need to do something fancy, do it at the transport layer (where end-to-end semantics are easier to control) or in an overlay (where you control both endpoints).
Operational debugging workflow
When forwarding misbehaves, here's the toolkit on a Linux box:
ip route get <dest> asks the kernel exactly which route would be selected for that destination. It returns the egress interface, the next hop, and the source address that would be used.
ip route get 8.8.8.8
ip -6 route get 2606:4700:4700::1111
This is the first thing to run when "I can't reach X." If ip route get returns unreachable, the kernel has no route. If it returns a route via the wrong interface or next hop, the routing table is wrong. Either way, you've localized the problem to the routing table before touching the wire.
traceroute and tracepath show the packet path hop by hop. tracepath is friendlier on modern Linux (no privileges needed, includes PMTUD discovery). traceroute -T does TCP probes, which sometimes succeed where UDP-default traceroute fails because UDP gets blocked.
tracepath 8.8.8.8
traceroute -T -p 443 8.8.8.8
A traceroute that shows the right hops and reaches the destination tells you forwarding is fine; the issue is in the application or transport layer. A traceroute that stops mid-path tells you a router along the way is dropping or not responding to the probes.
tcpdump on the egress interface lets you see whether packets are leaving at all, and on the ingress interface lets you see whether replies are coming back. Asymmetric routing — outbound packets fine, inbound packets gone — is a classic capture finding that is invisible to anything else.
sudo tcpdump -ni en0 host 8.8.8.8
Counters. Linux exposes per-interface counters:
ip -s link show en0
Look for RX errors, RX dropped, TX errors. A non-zero counter that's increasing is a yellow flag. RX dropped typically means the kernel ran out of buffer space or the packet failed a sanity check; TX errors typically means the link or driver had problems.
ss -tan shows TCP connection states. Combined with the forwarding-side tools, this lets you tell whether a TCP connection is failing to establish (forwarding is the prime suspect) or is established but not making progress (transport / application layer is the prime suspect).
A typical investigation sequence: ip route get to confirm the route, tracepath to check the path, tcpdump on the egress interface to confirm packets leave, tcpdump on ingress to confirm replies arrive, then ss to check transport state. If all five say "fine," the issue is above the network stack.
Hands-on exercise
Exercise 1 — Ask the kernel how it would forward a packet
# Where would a packet to Google's DNS go?
ip route get 8.8.8.8
# Same for Cloudflare's IPv6 DNS.
ip -6 route get 2606:4700:4700::1111
# Same for a local LAN address (typically directly attached, no next-hop needed).
ip route get $(ip -4 route | awk '/default/ {print $3; exit}')
For each destination, the output will include:
via <next-hop>— the IP address of the next router on the path. Missing for directly-attached destinations (the kernel will ARP/NDP for them directly).dev <iface>— the egress interface.src <source>— the address the kernel will set as the IP source for an outbound packet to this destination.
The src field is what source-address selection picks for connections originating on this host. If you have multiple addresses on an interface, the kernel uses an algorithm (RFC 6724 for IPv6, similar logic for IPv4) to choose. Knowing which src will be used matters for any service whose policy depends on source address.
Stretch: create a network namespace with a custom route table:
sudo ip netns add testns
sudo ip -n testns link set lo up
# Add a more-specific route to test longest-prefix match
sudo ip -n testns route add 198.51.100.128/25 via 192.0.2.3 dev lo
sudo ip -n testns route add 0.0.0.0/0 via 192.0.2.1 dev lo
sudo ip netns exec testns ip route get 198.51.100.140 # → 192.0.2.3
sudo ip netns exec testns ip route get 198.51.100.5 # → 192.0.2.1 (default)
# Cleanup
sudo ip netns del testns
Confirm the longest-prefix-match behavior matches your prediction.
Exercise 2 — Observe hop-limited forwarding feedback
tracepath -n routeharden.com
-n skips DNS resolution so the output is purely IP-based.
You'll see one line per hop. Each line was generated by a router on the path returning Time Exceeded to a probe with successively-larger TTL. The MTU sometimes drops along the path; tracepath calls those out explicitly.
Hops that return * * * are routers on the path that didn't respond to the probe. This is data — not "the network is broken." Some carriers configure their core routers to suppress ICMP for security or capacity reasons. The path is still complete; the trace just doesn't show its full identity.
Run traceroute -T -p 443 routeharden.com for comparison. TCP-based traceroute sometimes succeeds where UDP fails because some networks block outbound UDP that doesn't match an established session.
Common misconceptions
"Routers read application payload to decide where packets go." Forwarding is a destination-prefix lookup. The application payload is irrelevant. Some special cases (DPI middleboxes, deep policy enforcement) do read payload, but they're not routers in the classic sense; they're inline policy boxes that happen to live on the network path.
"Default route means the router stops checking specific routes." No — the default is the least specific match (length 0). The router still checks every route; longest-prefix match means more-specific routes always win when they match. The default is what handles "nothing else matches."
"TTL is a time field in practice." Modern stacks treat it as a hop counter. The original RFC's "decrement by at least 1 per second" rule has been universally abandoned.
"Fragmentation is just an old detail." PMTUD failures still break real applications and tunnels every day. The pattern is "small packets work, large packets disappear" and it's almost always a firewall blocking ICMP "Packet Too Big" or "Fragmentation Needed" messages somewhere on the path.
"Control plane and data plane are the same thing." The control plane decides what's in the routing table. The data plane uses the table to forward packets. They run at different speeds, on different hardware, and fail in different ways. Conflating them produces confused incident-response narratives.
Further reading
- RFC 791 — Internet Protocol. The original IPv4 forwarding semantics, including TTL and fragmentation. Read it once for the historical baseline.
- RFC 8200 — Internet Protocol, Version 6 Specification. IPv6's cleaner forwarding-path model and endpoint-only fragmentation rules.
- RFC 1812 — Requirements for IPv4 Routers. Old but still relevant for understanding what router behavior the standards expect, especially around ICMP generation.
- RFC 8201 — Path MTU Discovery for IP version 6. The modern PMTUD story and what to do when ICMPv6 is broken on the path.
- Larry Peterson and Bruce Davie, Computer Networks: A Systems Approach, book.systemsapproach.org. The forwarding chapters are the cleanest systems-level bridge from routing concepts to forwarding behavior.
The next module — UDP, the simplest transport — moves up to the transport layer with a deliberately minimal protocol that does almost nothing TCP does, and explains why "almost nothing" is sometimes exactly the right answer.
// related reading
DNS — name resolution end to end
DNS from first principles: zones, delegation, recursive vs authoritative resolvers, the wire format, caching, DNSSEC, DoH/DoT/DoQ, and where privacy actually leaks.
HTTP/1.1, HTTP/2, HTTP/3 — the evolution
Why HTTP needed three rewrites in twenty years: pipelining's failure, HTTP/2's multiplexing, QUIC's leap to UDP, and the head-of-line blocking that connects all three.
IPv6 fundamentals
IPv6 from first principles: address structure, SLAAC, Neighbor Discovery, extension headers, PMTUD, and the operational realities of dual stack.