Anonymity Engineering·2026-05-02·27 min read·advanced

Traffic analysis fundamentals

How timing, size, and burst structure leak information from encrypted traffic, from end-to-end correlation to website fingerprinting.

Encryption hides what you're saying. It does not hide that you're saying something, or roughly how much of it, or to whom, or with what timing. After payloads are opaque, the shape of the traffic — packet sizes, directions, inter-arrival timings, burst structure, total volume, flow durations — is still visible to anyone who can observe the network. That shape is the attack surface.

Traffic analysis is the discipline of reading information from traffic shape. It's the most expensive class of attack against well-encrypted modern protocols precisely because the cryptography forces it: there's no point trying to break TLS 1.3, but the encrypted bytes form recognizable patterns that often suffice to answer the questions an adversary actually wants answered. "Did Alice load this specific webpage?" "Which Tor circuit corresponds to which user?" "Is this VPN tunnel carrying video, or interactive shell, or torrenting?" Traffic analysis answers all of those without ever decrypting a single packet.

This module is the systematic treatment. We'll work through what features are actually leaked by encrypted traffic, what the major attack families look like (end-to-end correlation, website fingerprinting, circuit fingerprinting, flow-shape classification), why low-latency anonymity systems are especially vulnerable, why deep learning changed the field after 2018, why most published attack accuracy numbers significantly overstate operational reality, and what defense families exist (padding, shaping, batching, mixing, route diversity) — though we leave the deep dives on those defenses for the next two modules. The honest conclusion isn't "Tor is broken" or "padding solves it"; it's that anonymity claims must always specify latency tolerance, observer position, and what residual leakage you're willing to accept.

Prerequisites

tcp-at-the-wire-level — for understanding what shapes TCP traffic naturally has and what features are intrinsic to the protocol versus the application.
tor-onion-routing-and-circuit-anonymity — Tor is the canonical low-latency anonymity system that most traffic-analysis literature targets.
threat-models-for-network-anonymity — the adversary-first thinking traffic-analysis evaluation requires.

Learning objectives

Explain how timing, packet size, burst structure, and flow direction leak information even when payloads are encrypted.
Distinguish end-to-end correlation, website fingerprinting, circuit fingerprinting, and flow-shape classification, with example attacks for each.
Evaluate why low-latency anonymity systems are especially vulnerable to traffic analysis, and why mixnets accept latency cost in exchange.
Critique laboratory attack accuracy claims using closed-world versus open-world thinking, and identify the dataset assumptions that often inflate reported numbers.

Metadata is the attack surface left after encryption

Once payloads are encrypted with a modern AEAD cipher (AES-GCM, ChaCha20-Poly1305), the content is computationally indistinguishable from random noise to anyone without the key. The plaintext is gone. What remains visible to a network observer:

Source and destination IP addresses at the IP layer.
Source and destination ports at the TCP/UDP layer.
Packet sizes, including the encrypted payload size plus protocol headers.
Packet directions — which endpoint sent which packet.
Packet timestamps — the precise time each packet was observed.
Flow boundaries — when each TCP/UDP flow began and ended.
Packet ordering — what came before what.
TLS metadata — the SNI field is plaintext until ECH is widely deployed; the cipher suite negotiation is visible; the certificate chain is observable to anyone watching the handshake.
DNS lookups, unless DoH/DoT is used.
Higher-layer protocol patterns — even encrypted, HTTP/2 has stream identifiers, ICMP has its own framing, QUIC has packet numbers.

Each of these features carries information. Source/destination IPs identify endpoints. Ports identify services (HTTPS = 443, SSH = 22, DNS = 53). Sizes correlate with content type — a large GET request looks different from a large POST upload, a video stream's burst pattern is distinct from a chat session's. Timing reflects the application's behavior — typing-then-waiting cycles vs. continuous streaming vs. bursty page loads.

The traffic-analyst's job is to take these visible features and reconstruct properties of the encrypted-but-unobservable content: which page was loaded, which application is running, which user this is, which destination is being contacted through an anonymity overlay. The cryptography determines that the analyst cannot recover the actual bytes; it does not determine that the analyst cannot recover useful inferences about what those bytes represent.

A useful framing: encryption converts a strong-information-leakage problem (full plaintext visibility) into a weak-information-leakage problem (metadata visibility). For many attacker goals, the weak-information-leakage problem is still solvable — sometimes with surprisingly high confidence.

The primitive features: size, direction, timing, and bursts

Once you accept that the visible features are sizes, directions, timestamps, and ordering, the attacker's feature vector becomes concrete. A typical packet trace looks like this when payloads are stripped:

time         direction  size    flags
0.000        ->         1234    SYN
0.045        <-         60      SYN/ACK
0.046        ->         52      ACK
0.046        ->         517     PSH/ACK    (TLS ClientHello)
0.092        <-         1418    PSH/ACK    (TLS ServerHello + Certificate part 1)
0.092        <-         1418    PSH/ACK    (Certificate part 2)
0.092        <-         803     PSH/ACK    (Certificate part 3)
0.094        ->         126     PSH/ACK    (TLS handshake completion)
0.140        <-         52      ACK
0.142        ->         98      PSH/ACK    (HTTP/2 GET)
0.187        <-         1418    PSH/ACK    (response chunk 1)
0.187        <-         1418    PSH/ACK    (response chunk 2)
0.188        <-         847     PSH/ACK    (response chunk 3)
0.230        ->         52      ACK
...

What's encoded in this trace:

Sizes: The 517-byte initial outbound packet is consistent with a TLS ClientHello carrying a typical browser's set of extensions and ALPN options. Three large inbound packets summing to ~3700 bytes are likely a certificate chain (typical certificate-bundle sizes range from 2-5 KB). Subsequent ~1400-byte chunks are limited by the TCP MSS (1500 byte MTU minus 40 bytes of TCP+IP header).
Directions: Initiator sends short requests, server sends large responses. Classic browser-load asymmetry.
Timing: ~45ms for the initial RTT (the SYN/SYN-ACK gap), then a gap before the response data — consistent with a server-side processing delay. The bursts of inbound packets at 0.187 are consistent with TCP filling a window's worth of data.
Burst structure: Three inbound packets in a burst at 0.092, then four at 0.187, etc. Each burst pattern reflects window sizes, server-side response chunking, and network delivery.

Even without seeing a single byte of payload, an analyst with this trace can confidently say: "This is a typical browser loading an HTTPS page over a normal-RTT link." With many traces from the same user against many destinations, the analyst can start distinguishing destinations by their characteristic burst patterns. With a labeled training set, the analyst can build a classifier.

The four primitive features — size, direction, timing, burst — are the universe of inputs to all traffic-analysis attacks against encrypted traffic. Different attacks weight them differently and feed them into different machine-learning architectures, but they're all working from the same observable surface.

End-to-end correlation and its variants

The simplest and most fundamental traffic-analysis attack against anonymity overlays is end-to-end correlation. The setup: an adversary observes traffic at two points — typically the user's network connection and the destination's network connection — and tries to determine whether the same flow appears at both.

For VPNs, end-to-end correlation is trivial. The adversary observing both your home connection and the VPN exit's connection sees:

Outbound from home: encrypted UDP/TCP to VPN-provider IP, sizes S₁, S₂, S₃, ... at times T₁, T₂, T₃, ...
Outbound from VPN exit to destination: cleartext or differently-encrypted bytes to destination, sizes roughly equal to S₁, S₂, S₃, ... (modulo VPN-protocol overhead), at times roughly T₁ + δ, T₂ + δ, T₃ + δ, ...

The δ is the latency through the VPN tunnel — small and consistent for a given path. Matching the timing patterns is straightforward; matching the sizes (after accounting for VPN overhead) confirms the link. A few seconds of observation usually suffices.

For Tor, end-to-end correlation is harder because the cells are encrypted multiple times (so payload sizes don't match across hops in a way that's directly visible) and traffic is multiplexed onto fixed-size cells. But it's still possible:

The number of cells flowing in each direction over a circuit is observable.
The timing of cells passing through any single relay is observable to that relay's operator.
An adversary observing the user-to-guard traffic sees N₁ cells going out at times T₁, ..., Tₙ.
An adversary observing the exit-to-destination traffic sees N₂ packets going out at times T₁ + δ, ..., Tₙ + δ.
N₁ and N₂ aren't equal — Tor cells are 514 bytes, but the unwrapped packets at the exit are arbitrary application bytes. However, the timing pattern of cell arrivals and packet emissions is correlated. If a user loads a page that produces a distinctive burst (a sudden 1MB image embedded in a small HTML response), the adversary sees that burst in both observation windows.

Multiple correlation algorithms exist: DLI (Direction-Length-Index), CTA (Cell Timing Analysis), and machine-learning classifiers that take both timing and size as input. With enough observation time and enough simultaneous flows to disambiguate, accurate correlation against Tor users by an adversary observing both ends is well-documented in the academic literature.

Tor's design paper (Section 4.6) explicitly says Tor does not protect against this attack. The argument is utilitarian: defending against end-to-end correlation requires either dramatically increasing latency (mixnet model) or sending dummy traffic continuously, both of which destroy interactive usability. Tor accepts the GPA threat to remain usable.

The adversary required for end-to-end correlation isn't necessarily a true global passive observer. They need to see both observation points for the specific flow they want to correlate. Realistic deployments:

A nation-state observing all traffic into and out of its country (sees the user-to-guard side if the user is domestic, and the exit-to-destination side if the destination is domestic — much of the modern internet is geographically routed in ways that put both observation points within one country's view).
A large CDN provider seeing many destination-side traffic flows, paired with a partner who sees user-side traffic.
A guard relay operator (sees the user side) colluding with an exit relay operator (sees the destination side) — which is why Tor's path selection actively avoids picking guard and exit from the same operator family.

The defense Tor implements: path diversity. Try to ensure no single observer is in both positions for a given circuit. This works against many adversaries but cannot defeat a true GPA who is in everyone's traffic.

Website fingerprinting as supervised classification

A different attack class doesn't require observing two points. Website fingerprinting asks: given a single observation of an encrypted browsing session (typically the user-to-guard traffic on a Tor circuit), can we determine which website the user visited?

The intuition: each website produces a characteristic traffic shape when loaded. The HTML structure, the number and size of embedded resources (images, JS, CSS), the script-loading order, the third-party domains contacted — all produce a distinct burst pattern. Even encrypted, these patterns are observable.

The methodology is supervised machine learning:

Training data collection. The attacker visits a set of monitored websites — the monitored set — many times, capturing a packet trace each time. For each site, they accumulate dozens to hundreds of traces.
Feature engineering (in classical attacks) or feature learning (in deep-learning attacks). Classical features included: total trace duration, number of incoming packets, number of outgoing packets, total bytes per direction, packet-size histograms, burst structure, inter-arrival timing distributions. Modern deep-learning attacks let the model learn features from the raw trace.
Classifier training. A standard ML model (random forest, SVM, k-nearest neighbors, or a deep neural network) is trained to map traces to labels.
Attack-time classification. The attacker observes a target trace they don't have a label for. They feed it through the classifier; the classifier outputs a predicted label or a confidence distribution over labels.

Two evaluation models matter:

Closed-world evaluation. The attacker assumes all observed traces come from the monitored set. The classifier's job is just to pick which monitored site each trace corresponds to. Reported accuracy in closed-world settings has reached >95% in some published attacks.

Open-world evaluation. The attacker accepts that most observed traces come from non-monitored sites — the long tail of the web. The classifier's job is to flag traces that look like one of the monitored sites and reject everything else. This is much harder because the false-positive cost is high: a classifier that says "this trace is the monitored site example.com" needs to be confident enough to overcome the prior that most traffic isn't to monitored sites. Open-world accuracy numbers are usually substantially lower than closed-world.

The practical implications:

A targeted attacker with a small monitored set (e.g., "is this user reading any of these 20 specific dissident websites?") and a strong classifier can achieve high precision against high-stakes targets. The closed-world setup is roughly accurate for this case.
A general surveillance attacker trying to classify arbitrary user activity into broad categories has more open-world false-positive burden. Their classifier is useful as a triage tool but not as a deterministic identifier.
Both classes of attack scale with computational resources but not with cryptographic strength. Stronger TLS doesn't help.

Deep-learning attacks and what they changed

Before 2018, website fingerprinting attacks relied heavily on hand-engineered features. Researchers spent significant effort identifying which trace properties carried the most signal: cumulative bytes per direction, packet-count ratios, timing histograms, and burst-pattern descriptors. The best classical attacks used these features with classical ML algorithms (k-NN, random forests, SVMs) and achieved respectable accuracy.

The 2018 paper "Deep Fingerprinting: Undermining Website Fingerprinting Defenses with Deep Learning" by Sirinam et al. demonstrated that a deep convolutional neural network, fed nothing more than the raw sequence of packet directions and sizes, could outperform classical attacks substantially — even against defended traffic. The key insight was that automatic feature learning could discover patterns humans had missed, and that those patterns survived defenses (like WTF-PAD adaptive padding) that had been considered effective against classical attacks.

The implications:

The attack ceiling rose. Previously-effective defenses needed to be reevaluated against deep-learning classifiers, and many were found inadequate.
Defense design got harder. A defense that defeats hand-engineered features may not defeat features a CNN learns automatically. Robust defenses now have to consider what a deep model could potentially extract from the raw trace, not just what humans have currently identified.
Attack reproducibility improved. Deep-learning attacks need less domain expertise to deploy; the model architecture handles the feature-engineering work.
The arms race intensified. Defenses now need to be tested against deep-learning attacks; this is a higher bar than the classical-attack bar.

Subsequent research has continued in both directions. Defenses like Walkie-Talkie and FRONT (Front-padding) try to make the burst structure harder for deep models to learn. Attacks have continued to improve with attention mechanisms, transformer architectures, and adversarial training. The current state of the art is a frequently-shifting target.

The honest summary: deep-learning attacks demonstrated that classical defenses underestimated the attack surface. The race is ongoing; no current defense is uniformly successful against all current attacks. Engineers building anonymity systems should expect that any specific defense will eventually be defeated by a sufficiently good attack, and design accordingly — using defense in depth rather than betting on any single mechanism.

Onion services and circuit fingerprinting

Tor's onion services provide an interesting variant of the fingerprinting problem. A user accessing an onion service:

Doesn't go through a Tor exit (the path is client → 3 hops → rendezvous → 3 hops → service).
Has different traffic patterns from regular Tor browsing because the service is itself on Tor and the rendezvous-introduction protocol introduces specific message patterns.
Often uses long-lived circuits to a small set of onion services, which can themselves be fingerprintable.

Research like "How Unique is Your .onion?" (Overdorf et al., 2017) showed that onion services' fingerprintability varies widely — some sites are easy to identify by their traffic shape; others are much harder. The variation depends on:

Site size and complexity. Sites with distinctive resource profiles (specific large images, specific JS bundles) are easier to identify than minimalist sites.
Server response timing. Sites with predictable server-side processing latencies create timing fingerprints; sites with high variance are harder to fingerprint.
Access patterns. Sites the user visits frequently produce more accumulated trace data than rarely-visited ones, helping the classifier.

The attack model for onion services: an adversary observing the client side (specifically the guard relay) and trying to determine which onion service the client is visiting. The adversary doesn't see the rendezvous selection or the service-side traffic; they see only the client-side circuit traffic. From that one-sided observation, traffic-shape features can identify the site with substantial accuracy for some sites.

This is one of the more discomforting traffic-analysis findings, because users of onion services often have specific reasons to need anonymity and may underestimate how identifiable their browsing patterns are.

The Tor Project's response includes:

Vanguards — additional layers of guard nodes specifically for onion services to defend against guard-discovery attacks.
Padding negotiation — onion-service circuits can request and apply circuit-level padding more aggressively than regular browsing.
Hidden Service Naming Layer (HSNL) — proposed but not deployed work on making onion-service descriptors harder to enumerate.

These help against specific attack variants but don't fundamentally solve the website-fingerprinting problem for onion services.

Why attack results are easy to overstate

Academic traffic-analysis attacks typically report accuracy numbers in the 80-99% range. These numbers are real for the experimental setups they describe but often substantially overstate operational reality. Several recurring assumptions inflate the numbers:

Single-tab assumption. Most attack datasets are collected by browsing one site at a time, with a clean browser and no other traffic from the user's machine. Real users have multiple tabs open, background apps making requests, system updates downloading. The combined traffic is much harder to classify than a single isolated trace.

Stable network assumption. Datasets are collected on stable home or lab connections. Real users roam between WiFi, cellular, and wired networks; experience packet loss; have variable RTTs. Classifier performance generally degrades when the test traffic has different network characteristics from the training data.

Static page assumption. Pages change. CDNs serve different versions. Personalization, A/B tests, dynamic content all change the traffic shape between visits. A classifier trained on yesterday's traces of example.com may not work on today's traces if the site changed.

Fresh-cache assumption. Many trace-collection methodologies clear the browser cache before each visit to get clean traces. Real users' browsers have caches; cached pages produce dramatically different traffic shapes (much shorter, fewer requests).

Closed-world setup. As discussed, closed-world numbers are not directly applicable to open-world deployment. The base rate of the monitored sites within all traffic matters enormously.

Dataset drift. Models trained in 2020 may not perform on 2024 traffic because protocols have changed (HTTP/3 deployment, ECH rollout, TLS 1.3 ubiquity), browsers have changed (different TLS fingerprints, different request prioritization), and sites have changed.

Implementation gap from simulation to deployment. Many defense evaluations are done in simulation by post-processing traces. The "WFDefProxy" paper (Gong et al., 2021) showed that defenses tested in simulation often have implementation issues that change their effectiveness when actually deployed in a Tor pluggable transport.

These caveats don't mean the attacks aren't real. They mean that the gap between "this attack achieved 95% accuracy in this paper" and "this attack will reliably identify users of this onion service in the wild" is larger than the headline number suggests. A practical engineer reading attack literature should ask:

What was the open-world accuracy?
How were the traces collected?
Was caching disabled?
Was the user multitasking?
Was the defense tested in simulation or in deployment?
How recently was the dataset collected?
Has the network ecosystem changed since collection in ways that would affect the attack?

Even after all caveats, the attacks are useful to a determined adversary against high-value targets. The accuracy is "good enough" for triage — narrowing 1 million users down to 1,000 plausible candidates for further investigation — even if it's not "good enough" for direct identification with no false positives.

Defense families preview

This module establishes the attack surface; the next two modules (padding-strategies-and-cover-traffic — coming soon, and mix-networks-loopix-nym — coming soon) cover defenses in depth. Brief preview:

Padding adds dummy bytes to packets (or sends entirely dummy packets) to obscure size patterns. Padding can be:

Per-packet (each packet padded to a fixed size — defeats size-based features but doesn't help with timing or count).
Per-burst (bursts padded to a fixed total size).
Adaptive (padding decisions made based on traffic patterns to defeat specific attacks).
Continuous cover traffic (sending dummy packets even when no real traffic is queued).

The cost is bandwidth. Per-packet padding to the maximum cell size doubles bandwidth in many cases. Continuous cover traffic can multiply bandwidth use 5-10x.

Shaping smooths out timing patterns by buffering and re-sending traffic at fixed intervals (constant bit rate) or with deliberate jitter. The cost is latency.

Batching groups multiple packets together and emits them simultaneously, breaking the one-packet-in-one-packet-out timing pattern. The cost is latency and possibly throughput.

Mixing is what high-latency mix networks do: at each hop, batch many users' messages, reorder them, delay them, and emit them in random order. The combination of batching, reordering, and delay makes timing correlation statistically very hard. The cost is high latency (minutes to hours) and unsuitability for interactive use.

Cover traffic sends dummy traffic to make "communicating" indistinguishable from "not communicating." Provides unobservability properties at the cost of substantial continuous bandwidth.

Route diversity changes paths frequently or uses multiple parallel paths to make it harder for any single observer to see the full picture. Cost: complexity, potentially worse performance.

The pattern: every defense costs latency, bandwidth, or both. There is no free lunch. Anonymity engineering is fundamentally a budget-allocation problem: how much latency and bandwidth are users willing to pay for how much defense against which attacks?

What a practical engineer should conclude

Synthesizing the above:

Encryption is necessary but not sufficient for anonymity. Modern AEAD ciphers protect content; metadata attack surface remains.
Low-latency anonymity systems are vulnerable to traffic analysis. Tor accepts this; mixnets pay latency to defeat it. Choose based on threat model.
End-to-end correlation works against any low-latency overlay if the adversary observes both ends. Path diversity helps against most adversaries; against a true GPA, no defense exists at low latency.
Website fingerprinting is real and effective in many scenarios but its operational accuracy is often substantially lower than published numbers suggest. The threat is real for high-value targets; routine surveillance attacks face significant noise.
Deep-learning attacks raised the ceiling. Defenses that worked against classical attacks may not work against deep models. Defense evaluation should include modern attack architectures.
Defenses are about trade-off engineering. Padding costs bandwidth; shaping costs latency; cover traffic costs both. The engineering question is which costs are acceptable for which threat model.
Implementation reality often differs from simulation. Test defenses in deployment, not just in research papers.

The honest conclusion isn't doomism ("Tor is broken, anonymity is impossible") nor complacency ("padding solves it, we're fine"). It's a structured understanding: the attack surface is well-understood; the defenses are well-understood; the threat model determines which defenses are warranted; the cost-benefit calculation is explicit and engineering-tractable.

Hands-on exercise

Compare two encrypted flows by shape.

Tools: tcpdump or tshark, python3 with matplotlib. Runtime: 25 minutes.

Capture two short browsing sessions to qualitatively different sites:

# Session 1: visit a static-content site
sudo tcpdump -nn -i any -w /tmp/static.pcap "host static-site.example.com" &
TCPDUMP_PID=$!
curl -o /dev/null -s "https://static-site.example.com"
kill $TCPDUMP_PID
wait $TCPDUMP_PID 2>/dev/null

# Session 2: visit a dynamic site with many embedded resources
sudo tcpdump -nn -i any -w /tmp/dynamic.pcap "host dynamic-site.example.com" &
TCPDUMP_PID=$!
curl -o /dev/null -s "https://dynamic-site.example.com"
kill $TCPDUMP_PID
wait $TCPDUMP_PID 2>/dev/null

Convert pcap to a CSV of (timestamp, direction, size):

tshark -r /tmp/static.pcap -T fields \
  -e frame.time_relative -e ip.src -e ip.dst -e frame.len \
  -E header=y -E separator=, > /tmp/static.csv
tshark -r /tmp/dynamic.pcap -T fields \
  -e frame.time_relative -e ip.src -e ip.dst -e frame.len \
  -E header=y -E separator=, > /tmp/dynamic.csv

Then plot in Python:

import pandas as pd
import matplotlib.pyplot as plt

def plot_trace(csv_path, title, my_ip):
    df = pd.read_csv(csv_path)
    df["direction"] = df.apply(lambda r: "out" if r["ip.src"] == my_ip else "in", axis=1)
    df["signed_size"] = df.apply(lambda r: r["frame.len"] if r["direction"] == "out" else -r["frame.len"], axis=1)
    plt.figure(figsize=(10, 4))
    plt.bar(df["frame.time_relative"], df["signed_size"], width=0.005)
    plt.title(title)
    plt.xlabel("time (s)")
    plt.ylabel("packet size (bytes; +out/-in)")
    plt.grid(True)
    plt.show()

plot_trace("/tmp/static.csv",  "static site",  "192.168.1.42")
plot_trace("/tmp/dynamic.csv", "dynamic site", "192.168.1.42")

Look at the resulting plots. Differences you'll likely see:

The static site likely has one or two response bursts.
The dynamic site likely has many bursts as embedded resources are fetched.
The total trace duration differs.
The size distribution differs.

Now ask yourself: if you saw an unlabeled trace and wanted to classify it as "this site or that site," what features would you compute? Total bytes? Number of bursts? Inter-burst gaps? You're doing the same feature-engineering work that the early attack literature did.

Stretch: visit the same site twice and compare the two traces. They should be similar but not identical. Then visit it with a fresh browser cache vs. a warm cache. The difference is dramatic — caching is one of the assumptions that simulation-based attacks often miss.

Closed-world versus open-world thought experiment.

Tools: plain text. Runtime: 10 minutes.

Suppose an adversary wants to detect when users access any of 20 specific monitored websites among the millions of websites users could visit.

Closed-world setup: assume every observed trace is from one of the 20 monitored sites. The classifier picks the most likely monitored site. With reasonable models, accuracy can be 95%+.

Open-world setup: most traces are from the broader web. The classifier needs to distinguish "this trace is from one of the 20 monitored sites" from "this trace is from something else." For each trace, it outputs a confidence; if confidence exceeds a threshold, it flags the trace as monitored.

Suppose:

The classifier's true-positive rate is 90% (90% of monitored traces are correctly flagged).
The classifier's false-positive rate is 1% (1% of unmonitored traces are wrongly flagged).
The base rate of monitored traffic in the wild is 0.01% (one in 10,000 traces is to a monitored site).

What's the precision of the classifier (probability that a flagged trace is actually monitored)?

Out of 1,000,000 traces:

100 are actually monitored (0.01% of 1M).
Of those 100, 90 are correctly flagged.
999,900 are unmonitored.
Of those 999,900, 9,999 are wrongly flagged (1% false-positive rate).
Total flagged: 10,089.
Precision: 90 / 10,089 = 0.89%.

The classifier's flag is correct less than 1% of the time, even though it has 99% accuracy in a different sense. This is the open-world reality: low base rates of the monitored class crush precision unless the false-positive rate is very low.

For an attacker doing fully-automated mass surveillance, this precision is too low to act on. For a triage attacker (human review of every flag), 89-flags-investigated-per-1-actual-hit may be acceptable, especially if the cost per hit is small (subpoena ISP records for the user's IP). Whether the attack is "useful" depends on the adversary's process and resources.

Stretch: explain how this changes if the monitored set is one specific site that the user visits frequently (so the base rate within that user's traffic is higher, even though the base rate across all users is low).

Common misconceptions and traps

"If TLS is strong, traffic analysis is impossible." TLS protects content. Traffic analysis works on metadata that TLS leaves visible: packet sizes, directions, timings, flow boundaries. No improvement to TLS's cryptographic strength helps against traffic analysis, because the attacker isn't trying to break the cryptography.

"Attack accuracy in a paper transfers directly to the Internet." Lab attacks make many assumptions: single-tab, fresh cache, stable network, recent dataset, closed-world. Real-world attacks face base-rate problems, dataset drift, multitasking noise, and protocol evolution. Real-world accuracy is usually substantially lower than lab numbers, though still meaningful for high-value targets.

"Traffic analysis only matters to nation-state adversaries." Enterprises run traffic-analysis infrastructure to classify employee traffic. CDNs and ad networks use it for behavioral classification. Local network observers (corporate IT, ISP) can do basic flow-shape inference. The class of adversaries with traffic-analysis capability is much broader than just intelligence agencies.

"Website fingerprinting means identifying exact pages only." The same techniques apply to broader categories (is this user reading news? Is this user gaming? Is this video traffic from streaming service A or B?), to circuit identification (is this circuit carrying sensitive traffic?), and to application classification.

"Poor attack generalization means the risk is zero." A noisy classifier can still be useful if the target set is small (a few high-value users) or the stakes are high (one identified user equals a major intelligence win). "Imperfect classifier" doesn't mean "unusable classifier"; it means "useful with caveats."

"Padding solves it." Padding addresses some attack vectors but introduces bandwidth cost. Adaptive padding has been defeated by deep-learning attacks. Constant-rate padding is bandwidth-prohibitive for most users. There is no perfect padding scheme; the question is which costs you can accept for which defenses.

"My anonymity tool already does this." Most anonymity tools deployed in 2026 do not implement strong traffic-analysis defenses. Tor implements modest circuit-level padding for onion services and basic per-cell uniformity. Most VPNs do nothing. WireGuard does nothing — it's not in scope. If your tool's documentation doesn't explicitly describe traffic-analysis defenses, assume it has none.

"This is paranoid theory." Traffic-analysis attacks against Tor have been demonstrated in academic literature for over a decade and have very likely been deployed by intelligence services for as long. They aren't theoretical; they're a known and ongoing risk. The question is whether your threat model includes the adversaries who can deploy them.

"If I rotate circuits, I'm safe." Circuit rotation provides limited defense. Within a single circuit's lifetime, traffic analysis can succeed. Across circuit rotations, behavioral linkability (same user logging in to the same accounts, same browsing patterns) often defeats the rotation.

Wrapping up

Traffic analysis is the attack surface left after encryption removes content visibility. Sizes, directions, timings, and burst structures of encrypted packets carry enough information to support end-to-end correlation against low-latency anonymity overlays, supervised classification of encrypted browsing sessions (website fingerprinting), and circuit identification for onion services. Deep-learning attacks have raised the ceiling on what's achievable; classical defenses are no longer sufficient against modern attacks.

The honest engineering picture: encryption is necessary but not sufficient for anonymity. Low-latency systems trade traffic-analysis vulnerability for usability; high-latency systems trade usability for defense. Defenses are bandwidth-and-latency-cost tradeoffs, not free.

The next module (padding-strategies-and-cover-traffic — coming soon) goes into the defense-side detail: which padding schemes actually work, what bandwidth cost they impose, and which attacks they defeat. The mixnet module after that (mix-networks-loopix-nym — coming soon) covers high-latency systems where the latency cost is paid in exchange for traffic-analysis resistance.