Anonymity Engineering·2026-05-02·22 min read·advanced

Padding strategies and cover traffic

Constant-rate padding, adaptive padding, dummy traffic, and why hiding packet shape is harder than appending zeros.

The previous module (traffic-analysis-fundamentals) established the attack surface: encrypted traffic leaks information through its shape — sizes, directions, timings, burst structure. This module is about the defenses. Padding and cover traffic are how anonymity systems try to obscure that shape.

The naive view of padding is "appending zero bytes to make packets the same size." This is wrong, or rather: it's a small fraction of the actual defense space. Real defenses against traffic analysis include record padding, link padding, flow shaping, dummy packets, constant-rate transmission, adaptive timing-gap injection, and combinations thereof. Each addresses different leakage; each costs different resources; none is uniformly applicable.

This module is the costed, engineering-honest treatment. We'll separate the defense families that the literature often conflates, walk through the bandwidth/latency/anonymity trade-off triangle, look at why constant-rate cover traffic is theoretically attractive but operationally punishing, examine adaptive padding (WTF-PAD and successors) and why it's the practical compromise, see what Tor's production padding actually does (proposals 254 and 302), and end with what padding cannot fix. The honest summary: padding is a costed defense that improves things on the margin; nothing in the toolkit makes a low-latency overlay safe against a determined GPA. The deeper anonymity properties require accepting high latency (next module: mix-networks-loopix-nym).

Prerequisites

threat-models-for-network-anonymity — for the adversary classification this module's evaluations require.
traffic-analysis-fundamentals — the attack-side knowledge that motivates these defenses.

Learning objectives

Distinguish message padding, link padding, flow shaping, and cover traffic instead of treating them as one idea.
Explain why effective defenses target observable distributions, not just individual packet sizes.
Compare constant-rate, adaptive, and burst-based defenses in terms of latency cost, bandwidth cost, and deployment complexity.
Evaluate why partial padding can still help while full unobservability remains expensive.

Why padding exists at all

Padding exists because the previous module's lesson is true: encrypted traffic leaks shape. An adversary who can see packet sizes, directions, and timings can fingerprint websites, correlate end-to-end flows through anonymity overlays, classify application types, and identify users by behavioral patterns. Padding is the family of defenses that tries to distort or hide that shape.

The framing matters. Padding is not about cryptographic confidentiality — the cryptography is already in place. It's about metadata privacy: making the visible features of encrypted traffic less informative. Even a perfect cryptographic suite leaves a trace whose shape is visible; padding is what addresses the trace, not the cipher.

A useful mental model: encryption shrinks the attacker's view from "I see everything" to "I see only metadata." Padding tries to shrink the view further: from "I see metadata" toward "I see less informative metadata." The amount of shrinkage you can buy with padding depends on how much bandwidth and latency overhead you're willing to pay. There's no free lunch.

The design space: record padding, packet padding, flow shaping, dummy traffic

The literature uses "padding" loosely to mean any of several distinct defenses. Untangling them:

Record padding (within a single message). Append bytes to the encrypted record so that the record's length doesn't reveal the plaintext length. TLS 1.3 supports this via the padding_length parameter; you can set every TLS record to a multiple of, say, 1024 bytes, padding shorter records. EDNS(0) padding for DNS (RFC 8467) is a similar idea: pad DNS queries to a uniform length to hide which query is being sent. Record padding addresses length-of-this-message leakage but not timing or count.

Packet padding (within a network packet). Sometimes synonymous with record padding when the record is one packet, but can also include adding bytes at the link layer (Ethernet padding to minimum frame size, IP fragmentation/reassembly nuances, padding at the encapsulation layer in tunneling protocols). Tor's fixed 514-byte cells are a form of mandatory packet padding — every cell is exactly 514 bytes regardless of payload, so size-based features at the cell layer are uniform.

Flow shaping. Modify the timing of packet emissions, not just the sizes. Constant-rate flow shaping sends one packet every X milliseconds regardless of whether real data is queued (sending dummy packets when no real data is available; queuing real packets if the schedule slot is missed). Burst-shaping aggregates packets into fixed-size bursts emitted at uniform intervals. Shaping addresses timing-based leakage by replacing the natural application timing with an artificial schedule.

Cover traffic / dummy traffic. Send entirely synthetic traffic — packets with no real payload, just dummies — to make "communicating" indistinguishable from "not communicating." Cover traffic can be sent on real circuits (filling gaps in real traffic) or as separate flows. The intent is unobservability: an adversary observing the link can't tell whether the user is actively communicating because there's always traffic.

These aren't mutually exclusive. A robust defense often combines packet padding (uniform sizes), flow shaping (uniform timing), and cover traffic (unobservability). Each addresses a different attack vector; each costs a different resource:

Defense	Addresses	Bandwidth cost	Latency cost
Record padding	Per-message length	Modest (10-50% overhead)	None
Packet padding	Per-packet length	Modest to high (depends on MTU)	None
Flow shaping	Timing structure	Moderate (depends on schedule)	Significant (queuing delays)
Cover traffic	Communication occurrence itself	Heavy (constant background cost)	None directly, but congestion

The cost columns are illustrative; actual numbers depend on traffic patterns and design specifics. A bursty user might pay 50% bandwidth overhead for moderate cover traffic; a continuously-active user might pay 5%.

Constant-rate defenses

Constant-rate transmission is the theoretically clean defense: emit one packet every X milliseconds, with constant size, regardless of whether real data is queued. If real data is available, a real packet is sent; if not, a dummy is sent. From an external observer's perspective, the link is indistinguishable from a "constant background hiss" of identical packets — no information about real content, real timing, or even whether the user is doing anything at all.

This achieves something close to unobservability for the link. Combined with similar discipline at every hop in an anonymity path, it would defeat both end-to-end correlation and website fingerprinting at the link layer.

Why nobody uses it for general-purpose internet traffic:

Bandwidth cost. A user idle most of the time still pays for continuous transmission. At 1 Mbps constant rate, a user who's actually using 50 KB/day pays 10.8 GB of dummy traffic per day. The ratio is 200,000:1 dummy-to-real for a light user. For mobile users on metered connections, this is unacceptable.

Latency cost. Real packets that arrive between schedule slots have to wait. If the schedule is one packet per 50ms and a real packet arrives just after a slot, it waits up to 50ms to be sent. For interactive applications (SSH typing, voice calls), this floor on latency is noticeable.

Power cost. On mobile devices, the radio has to stay active to transmit and receive constantly. Battery life suffers significantly.

Asymmetry doesn't fit. Most internet traffic is asymmetric — clients send small requests, servers send large responses. Constant-rate transmission imposes a symmetric pattern that doesn't match how the application actually communicates. Asymmetric constant-rate (different rates per direction) helps but breaks the symmetry of the unobservability claim.

Multiplexed flows interact poorly. A user with multiple ongoing flows gets only the schedule's bandwidth divided among them. If the schedule is provisioned for the heaviest expected load, light users overpay; if provisioned for typical load, heavy users get throttled.

Constant-rate is used in some specific contexts: high-stakes covert channels, theoretical mixnet designs (where the overhead is accepted as the price of anonymity), military communication systems where bandwidth is bought and paid for in advance. For general anonymity overlays usable by ordinary users on ordinary connections, constant-rate is too expensive.

The approximation that is used: padding selectively, in a way that defeats specific attacks at lower overhead. This is adaptive padding.

Adaptive padding and why WTF-PAD mattered

Adaptive padding (the canonical reference is the WTF-PAD paper, Juarez et al. 2016) works on the insight that not all moments in a traffic trace are equally informative. Some gaps in the natural traffic are highly identifying (a long pause before a particular burst of activity); others are uninformative (a short pause inside an ongoing burst). If a defense can identify the informative gaps and inject dummy packets specifically there, it can neutralize the most discriminative features while paying much less overhead than constant-rate transmission.

The WTF-PAD architecture: a state machine that observes the natural traffic timing and probabilistically emits padding packets to disrupt informative gap patterns. Roughly:

Maintain a histogram of "expected gap distributions" — the times between packets that would be normal in non-monitored traffic.
As real traffic flows, observe each inter-arrival gap.
If the gap looks informative (it lasts longer than typical, or it has a distinctive pattern), emit a padding packet to break the pattern.
The padding packets have the same size as real packets (uniform packet size also assumed) and are encrypted such that the receiver can recognize and discard them.

The result: the observable traffic shape is closer to "generic background traffic" than the original distinctive shape. Bandwidth overhead is moderate (often 30-50% rather than 1000%+); latency overhead is minimal because no real packet is delayed (only dummies are added).

WTF-PAD was effective against the classical website-fingerprinting attacks of the time. It was widely studied and considered a strong defense. Then in 2018, the Deep Fingerprinting paper (Sirinam et al.) demonstrated that deep convolutional networks could classify WTF-PAD-protected traces with substantially higher accuracy than classical attacks against undefended traces. The defense had been designed against features that humans had identified; the deep model learned different features that the defense didn't address.

This is the recurring pattern in adaptive padding: a defense is designed against specific attacks, the attacks improve, the defense is rendered insufficient. WTF-PAD wasn't useless after Deep Fingerprinting; it still raised the cost of attack and degraded accuracy somewhat. But it wasn't the protective ceiling it had once seemed to be.

Subsequent adaptive defenses (Walkie-Talkie, FRONT, Tamaraw, RegulaTor) try to address the deep-learning attack vector with various combinations of:

Burst-level padding. Pad bursts to uniform sizes and uniform inter-burst gaps (defeats burst-pattern features deep models learn).
Trace molding. Make every trace look like a chosen "decoy" trace from a fixed set, by padding selectively to match the decoy's shape.
Constant-rate fallback. Switch to constant-rate transmission for the highest-stakes portions of a session.
Direction-balancing. Pad to make incoming and outgoing byte counts more uniform.

Each defense's effectiveness is in flux as attacks evolve. The current state-of-the-art is a moving target; no single defense is uniformly successful.

Tor's padding machinery

Tor's actual deployed padding has evolved through several proposals and is documented in the spec corpus. The key proposals:

Proposal 254 — Padding Negotiation (2017): Established the idea that two relays could negotiate padding parameters between themselves. Before this, Tor padding was either hardcoded or absent. Negotiation lets clients request specific padding behavior from their guard, and lets onion services request padding on their circuits.

Proposal 302 — Padding Machines for Onion Clients (2018): Generalized the negotiation idea into a framework where padding is described as a finite-state machine. Each state defines what events cause padding to be emitted (a real cell sent, a real cell received, a timer expiry, etc.) and what events cause state transitions. The framework is general enough to express WTF-PAD, constant-rate, burst-shaping, and many other strategies.

The key insight in proposal 302 is that production padding becomes a control system, not a one-line rule. Each padding machine has:

States with transitions based on cell events.
Distributions describing inter-padding-cell delays, sampled to introduce timing variance.
Counters and budgets to limit total overhead per machine.
Decision logic for when to start, when to stop, and when to emit dummy cells.

A simplified pseudocode sketch of an adaptive padding machine:

state = IDLE
gap_distribution = histogram from labeled training data
budget = MAX_PADDING_CELLS

while connection is open:
  event = wait for next event (cell sent, cell received, timer expiry)

  match event:
    case CELL_RECEIVED:
      if state == IDLE:
        state = ACTIVE
        schedule_next_padding_check(sample_from(gap_distribution))

    case CELL_SENT:
      reset_idle_timer()

    case PADDING_TIMER_EXPIRY:
      if state == ACTIVE and budget > 0:
        emit_padding_cell()
        budget -= 1
        schedule_next_padding_check(sample_from(gap_distribution))

    case IDLE_TIMEOUT:
      state = IDLE

Real padding machines are more elaborate — they track multiple states, handle burst boundaries, include hysteresis to avoid pathological oscillation, and have hooks for circuit-level events (rekey, close, error). The machinery is general enough that researchers can propose new defenses by writing new state machines without changing the Tor core.

In production, Tor enables modest padding for onion-service circuits (proposal 302's primary deployment target) but is more conservative about padding for general circuits. The reasoning: the bandwidth overhead is borne by the relays (every padding cell consumes relay bandwidth), and relay capacity is the network's bottleneck. Aggressive padding helps individual users but reduces total network throughput; the deployment decisions balance both concerns.

Protocol examples outside Tor

Padding isn't only for anonymity overlays. Several mainstream protocols include padding mechanisms with various motivations:

TLS 1.3 record padding. RFC 8446 added optional padding to records. The cipher suite provides a length field that can be longer than the actual plaintext, with the difference filled by zero bytes. Use cases: hiding the length of TLS messages from passive observers, preventing length-based cryptographic side channels in some constructions. Many TLS libraries support setting padding policies; the default in most clients is no padding (because the bandwidth cost is non-trivial and the default threat model doesn't require it).

EDNS(0) padding for DNS (RFC 8467). DNS queries are short and their length is informative — short queries probably ask about specific domains; longer queries probably ask for SOA records or DNSSEC metadata. RFC 8467 specifies how DNS queries can be padded (typically to a multiple of 128 or 468 bytes, or to a maximum size) to hide which query is being sent. Useful when DNS travels over an encrypted transport (DoH, DoT) where length is the only remaining leakage. See doh-vs-dot-leak-comparison for the broader DNS-over-encryption story.

SSH packet padding. SSH pads its packets to a multiple of the cipher block size. The padding is mandatory but the amount is partially attacker-controllable (padding length is encrypted but the receiver can compute it from the packet length). SSH padding addresses cryptographic alignment, not traffic analysis specifically.

HTTP/2 frame padding. HTTP/2 frames can include padding bytes. Used (rarely) to obscure the length of HTTP/2 messages from observers.

IPsec padding. ESP packets include padding to align to cipher block size and optionally to obscure inner packet length. Implementation-defined how aggressive padding can be.

These protocol-level padding mechanisms generally address single-message length leakage. They don't address timing, count, or burst patterns. They're not full anonymity defenses; they're features that compose with other defenses.

What padding cannot fix

Padding is one layer in a defense stack. Other layers fail in ways padding doesn't address:

Long-lived application identifiers. Cookies, session tokens, account credentials. If the user logs in, the application knows who they are regardless of how the transport is padded. Padding the network shape doesn't prevent the application from identifying the user.

Browser fingerprints. TLS JA3/JA4 fingerprints, HTTP/2 settings, browser-API characteristics. The application server can identify the browser instance from these even when the network shape is well-padded. See ja3-ja4-tls-fingerprinting and browser-fingerprint-hardening.

Distinctive application behavior. Loading a page with a unique structure of subresource requests, in a specific order, with specific timing — the higher-layer behavior of the application can be more identifying than the lower-layer traffic shape. Padding addresses the latter; it can't address the former without also restructuring application behavior.

End-to-end correlation against a global adversary. Even with strong padding at every link in a Tor circuit, an adversary observing both endpoints sees both the user's encrypted traffic to the guard and the destination's incoming traffic from the exit. If the relationship between input and output is preserved (i.e., real traffic still has correlated timing patterns at both ends, because padding can't perfectly hide the timing of real bursts without becoming constant-rate), correlation can succeed.

Route-level metadata. Which guard a user picked, which exit, which path through the network — visible from above the link layer. Padding doesn't change which relay sees which user; it only changes what each relay's traffic looks like.

Operational mistakes. A user accessing the same account across an anonymous and an identified context links the contexts at the application layer. No amount of transport padding helps.

The summary: padding is a defense at the network-traffic-shape layer. Leaks at other layers (application, identity, behavioral, operational) are unaddressed. A complete defense requires attention to all layers; padding is necessary but insufficient.

When the defense hurts more than the attack

A defense that costs more than the protection is worth is anti-engineering. Some scenarios where padding's cost exceeds its benefit:

Threat model doesn't include traffic-analysis adversaries. A user worried only about local-WiFi snooping doesn't benefit much from padding. The local observer already can't see destinations or content; padding makes their observations slightly less informative but doesn't change the basic threat picture. The bandwidth cost is paid for marginal gain.

Mobile / metered connections. Cover traffic on a 4G connection costs real money. A user paying $10/GB pays for every dummy packet. Heavy padding can multiply data usage 5-10x; for many users that's a deal-breaker.

Real-time applications. Voice and video calls have strict latency budgets. Flow shaping with even modest queueing delays (50ms) can degrade call quality. Padding that delays real packets is incompatible with real-time use.

Low-throughput links. A link operating near saturation has no headroom for cover traffic. Adding constant-rate padding may consume so much bandwidth that real traffic becomes congested. The defense reduces overall usefulness.

Wrong layer of defense. An organization worried about compliance and audit doesn't benefit from padding if the actual threat is application-layer logging at the destination. Padding is a network-layer defense; some threats need application-layer or operational defenses.

The honest engineering practice: enumerate the threat model first, then evaluate which defense layers are worth investing in. Padding for the sake of padding (because it's mentioned in security checklists) is sometimes counterproductive.

Hands-on exercise

Simulate padding overhead on a toy trace.

Tools: python3. Runtime: 20 minutes.

Define a toy packet trace and compare overhead under three strategies: no padding, fixed-cell padding, and constant-rate dummy padding.

import random
from typing import List, Tuple

# Toy trace: list of (time, size_bytes) for outbound packets
trace: List[Tuple[float, int]] = [
    (0.000,  517),   # TLS ClientHello
    (0.140,   98),   # HTTP request
    (0.500, 1418),   # response chunk
    (0.501, 1418),
    (0.502,  812),
    (1.200,  100),   # follow-up request
    (1.500, 1418),
    (1.501,  340),
]

def total_real_bytes(trace):
    return sum(size for _, size in trace)

def fixed_cell_overhead(trace, cell_size=514):
    """Pad every packet to cell_size; large packets split into multiple cells."""
    total_padded_bytes = 0
    for _, size in trace:
        cells_needed = max(1, (size + cell_size - 1) // cell_size)
        total_padded_bytes += cells_needed * cell_size
    return total_padded_bytes

def constant_rate_overhead(trace, rate_packets_per_sec=20, cell_size=514, duration=2.0):
    """Send one cell every 1/rate seconds for the full duration; real
    packets are queued into available slots, dummies fill the rest."""
    total_slots = int(duration * rate_packets_per_sec)
    return total_slots * cell_size

real = total_real_bytes(trace)
fixed = fixed_cell_overhead(trace)
constant = constant_rate_overhead(trace)

print(f"Real bytes:                 {real:>8}")
print(f"Fixed-cell padded:          {fixed:>8}  (overhead {fixed/real:.2f}x)")
print(f"Constant-rate (20 pps, 2s): {constant:>8}  (overhead {constant/real:.2f}x)")

Expected output (numbers approximate):

Real bytes:                     6121
Fixed-cell padded:               7196  (overhead 1.18x)
Constant-rate (20 pps, 2s):     20560  (overhead 3.36x)

Now vary the parameters:

Set the constant rate to 100 packets per second (high-rate cover). Overhead jumps dramatically.
Set the duration to 10 seconds (long idle time). Overhead jumps because most slots send dummies.
Make the trace bursty (load all the bytes at time 0). Fixed-cell padding stays cheap; constant-rate stays expensive.

The exercise shows the cost relationship: fixed-cell padding is cheap on light traffic but doesn't address timing; constant-rate is expensive but addresses both size and timing.

Stretch: implement a simple adaptive-padding routine that emits a dummy cell only when there's been no real traffic for >100ms. Measure the overhead; compare to constant-rate.

Read a Tor padding proposal as an engineer.

Tools: web browser, notes. Runtime: 15 minutes.

Read Tor proposal 302. Specifically:

What state objects does a padding machine carry?
What events transition the state machine?
How are padding inter-emit delays determined?
What is the budget mechanism, and what does it protect against?
Why is the negotiation between client and guard necessary (vs. unilateral padding)?

Write a one-paragraph summary of why production padding became a control system rather than a one-rule defense. (Hint: the rule "send dummies during idle gaps" doesn't specify how to detect idle gaps, how to choose dummy frequencies, how to bound total overhead, how to handle bursts, or how to interact with circuit lifecycle. The state-machine framing is what makes all those decisions explicit and configurable.)

Common misconceptions and traps

"Padding means appending zero bytes." That's record padding only — one specific defense. Real defenses include record padding, packet padding, flow shaping, dummy traffic, and combinations. The "zero bytes" view misses the timing, burst, and unobservability defenses that constitute most of the actual padding literature.

"More padding always means better anonymity." Padding addresses specific observable features. A defense that pads sizes but not timing leaves timing-based attacks undefended; a defense that pads timing but leaves size patterns intact leaves size-based attacks undefended. Adding more padding to features that aren't the discriminative ones costs bandwidth without buying defense. The right padding is targeted, not maximal.

"Cover traffic is free if bandwidth is cheap." Even setting aside metered connections (where it isn't free at all), cover traffic costs power on mobile devices, congests networks at scale, and can interact with QoS policies in ways that make real traffic slower. The "free" view ignores second-order costs.

"Protocol-level padding and anonymity padding are the same." TLS record padding hides per-message length but doesn't address timing, count, or burst patterns. EDNS(0) padding hides DNS-query length but the query still happens at a particular time and produces a particular response. These protocol-level paddings are incremental improvements; they're not full traffic-analysis defenses.

"Because Tor does some padding, website fingerprinting is solved." Tor's deployed padding (as of 2026) is modest for general circuits and more aggressive for onion services, and the academic literature has demonstrated that website fingerprinting attacks remain feasible against Tor traffic with current defenses. The Tor Project is open about this in proposal 302 and elsewhere; treating Tor's padding as a complete solution is overconfident.

"Constant-rate cover traffic gives unobservability." It approximates unobservability for the link layer, but only if every link in the path uses constant-rate. Most production deployments don't, because the bandwidth cost is unacceptable. Even with constant-rate at every link, application-layer behavior (when does the user log in, when do they post, when do they receive responses) can leak observability.

"Padding overhead doesn't matter at modern bandwidth." It does for many users. Mobile, metered, low-throughput, and battery-constrained users pay real costs for padding. Designers who assume "everyone has gigabit fiber" miss the deployment reality of much of the world.

"My defense was tested against attack X, so it works." Defense evaluations should target the strongest available attacks. A defense tested only against classical attacks is not validated against deep-learning attacks. A defense tested only against single-tab traces is not validated against multi-tab realistic traces. Evaluation rigor matters; the academic literature has many examples of defenses that worked against one attack class and failed against another.

Wrapping up

Padding and cover traffic are the family of defenses that address the metadata leakage encryption leaves unprotected. The defense space is broader than "appending zero bytes" — record padding, packet padding, flow shaping, and dummy traffic each address different observable features at different costs. Constant-rate transmission is theoretically clean and operationally punishing; adaptive padding is the practical compromise; deep-learning attacks have raised the ceiling on what defenses must accomplish.

In production, Tor implements modest padding via the proposal-302 padding-machine framework, with negotiation between clients and guards, state-machine descriptions of when to emit dummy cells, and budget mechanisms to bound overhead. Other protocols include narrower padding mechanisms (TLS record padding, EDNS(0)) that address single-message length leakage without claiming to be full traffic-analysis defenses.

The honest engineering reality: padding raises the cost of attack and reduces the accuracy of classifiers, but no current defense makes a low-latency overlay safe against a determined GPA. The deeper anonymity properties — unobservability, GPA resistance, statistically-strong unlinkability — require accepting high latency. The next module (mix-networks-loopix-nym — coming soon) covers high-latency mix networks where the latency cost is paid in exchange for the stronger anonymity guarantees padding alone cannot provide.