Detection · Part 2 of 6·Anonymity Engineering·2026-05-02·20 min read·advanced

Active probing methodology

How detectors confirm suspicious endpoints with chosen inputs, from state-machine exploration to probe-resistant proxy design.

The previous module described how passive classification turns observed traffic into probabilistic guesses about what protocol is running. This module covers the next stage: active probing, where the detector turns probabilistic guesses into protocol confirmations by sending chosen inputs to the suspicious endpoint and observing how it responds. Active probing is the technique nation-state firewalls use to defeat protocol mimicry; it raises the cost of evasion substantially because it can confirm a guess that passive classification could only suspect.

The fundamental shift: passive observers are limited to what natural traffic reveals; active probers can manipulate the conversation to elicit distinguishing responses. A Tor bridge that's pretending to be a generic HTTPS server might survive passive classification — its TLS fingerprint is normal-looking, its traffic shape is unremarkable. But when an active prober connects to it with a Tor client preface, the bridge responds Tor-like; when an active prober connects with a real HTTPS request, the bridge fails to behave like a real server. The difference, observable only by interaction, gives the bridge away.

This module is the methodology and architecture treatment. We'll cover what an active probe actually does (destination selection, payload choice, response interpretation), how state-machine exploration generalizes the technique, the historical lessons from Tor bridge probing campaigns, the design pattern for probe-resistant proxies, why even "ignore unknown input" can become its own fingerprint, and the measurement-ethics considerations that constrain Internet-scale probing research. Specific defensive constructions (REALITY's particular tricks, naïveproxy's HTTP-server emulation) belong to Track 6 and are flagged here without going deep.

Prerequisites

tor-onion-routing-and-circuit-anonymity — Tor bridges are the canonical active-probe target.
threat-models-for-network-anonymity — for the adversary-first thinking probing-resistance evaluation requires.
os-and-tcpip-stack-fingerprinting — passive fingerprinting is what generates the suspicions probing confirms.
deep-packet-inspection-pattern-statistical-behavioral — the previous module on how passive classification works.

Learning objectives

Explain why censors and detectors use active probing to confirm protocol guesses generated by passive observation.
Distinguish banner grabbing, state-machine exploration, probe replay, and secret-dependent probing as different probing strategies.
Analyze what makes a proxy or bridge probe-resistant rather than merely non-obvious in passive traces.
Evaluate the measurement ethics and false-positive risks of active Internet probing.

Why passive detection is often only stage one

Passive classification — looking at flow shape, TLS fingerprints, packet timing — produces probabilistic guesses. The classifier might say "this destination on port 443 has a TLS fingerprint that looks like Tor's, and its traffic pattern is consistent with a Tor bridge, with 60% confidence." 60% confidence isn't enough to act on for many operators; false positives at 40% would destroy huge amounts of legitimate traffic.

Active probing turns the 60% confidence into a near-certainty. The prober sends a Tor client handshake initiation to the suspicious endpoint. If the endpoint responds with a Tor-style response (acceptable cipher, valid Tor protocol behavior), confidence goes to 99%+. If the endpoint responds like a generic HTTPS server (TLS handshake completes but Tor protocol fails), confidence drops; the endpoint isn't a Tor bridge.

The two-stage pattern:

Passive observation. Identify suspicious endpoints from traffic patterns. This is cheap (no new traffic generated) and broad (you observe everything passively).
Active confirmation. For each suspicious endpoint, send specific probes designed to elicit distinguishing responses. This is expensive (requires actually connecting to each endpoint) but precise.

Why operators do both:

Pure passive classification. Cheap but probabilistic. False-positive rates are inherent to the classifier; can't be reduced below the algorithm's limit.
Pure active probing. Precise but expensive. Probing every IP on the internet would take significant infrastructure and would itself be detectable.
Combined. Use passive classification to narrow the search space; use active probing to confirm the survivors. Cheap aggregate cost, near-certain confirmation.

The Great Firewall is the largest deployed example. It does passive classification of all traffic crossing its borders, generating a list of suspicious-IP candidates daily; it then runs active probes against those candidates from probing infrastructure inside China. Confirmed bridges get blocked; unconfirmed candidates get re-checked periodically.

The anatomy of a probe

An active probe has several engineering decisions:

Destination selection. Which IP addresses to probe? The suspect list comes from passive classification, but probing every suspect immediately may be too expensive or too detectable. Common patterns: probe newly-suspicious addresses with high priority; re-probe already-confirmed addresses periodically (in case they've gone offline or rotated); randomize timing to avoid being a pattern itself.

Trigger event. When to probe? Often: when a new suspicious endpoint is observed; when the endpoint is in a known-suspect list; when traffic to the endpoint exceeds a threshold. The trigger determines how many probes happen and when.

Payload choice. What protocol prefix to send? This is where the methodology gets interesting. Common probes:

TLS ClientHello with a generic SNI. Tests whether the endpoint speaks TLS at all and how its handshake responds.
TLS ClientHello with a known-Tor SNI pattern. Tests for Tor-specific TLS configurations.
Tor client preface. Sends what a real Tor client would send; observes whether the endpoint responds Tor-like.
HTTP request to the root. Tests whether the endpoint serves a real-looking HTTP response.
Random/garbage data. Tests how the endpoint handles unexpected input.
obfs4-style probe. Sends what an obfs4 pluggable transport client would send.
shadowsocks/V2Ray client preface. Tests for those proxy protocols.

A sophisticated probe sequence sends several different probes in sequence to characterize the endpoint comprehensively.

Timing. How quickly to probe? Too fast: looks like a port scanner, attracts attention. Too slow: takes too long. Realistic probing schedules space probes by seconds to minutes.

Retry logic. What if the probe doesn't get a response? Retry once or twice; if still no response, mark the endpoint as down or unreachable.

Response interpretation. How to classify the response? Specific positive responses (a valid Tor handshake completion) confirm the suspicion. Specific negative responses (a normal HTTP response) reject it. Ambiguous responses (timeout, connection reset, generic TLS error) require further investigation or a second probe.

The probe-set design is iterative. As proxy designs evolve, probes evolve to detect them. As probes become more sophisticated, proxies adapt. The arms race is well-documented in the academic literature.

Probing as state-machine exploration

A useful framing: every protocol implementation is a state machine. The state machine accepts certain inputs and transitions to certain states; it rejects inputs that don't fit. The valid traversals through the state machine define what the protocol does. The implementation's response to invalid traversals is a fingerprint.

Generic HTTPS server state machine (simplified):

INITIAL → TCP_ACCEPT → TLS_CLIENTHELLO_RECEIVED →
TLS_HANDSHAKE_COMPLETE → HTTP_REQUEST_RECEIVED →
HTTP_RESPONSE_SENT → HTTP_REQUEST_RECEIVED → ...

The probe asks: what happens at each state? Specifically:

After TCP_ACCEPT, send malformed TLS bytes. Does the server immediately reset, or does it time out, or does it send a TLS Alert?
After TLS_HANDSHAKE_COMPLETE, send malformed HTTP. Does it return a 400 Bad Request, or does it close, or does it just not respond?
After HTTP_REQUEST_RECEIVED for a known-bad URL, what does it serve?

A normal Apache server has well-defined responses to all of these. An obfs4 bridge running on the same port has different responses because its state machine is different. The probe is exploring the state machine to find the differences.

A probe-resistant proxy tries to mimic a normal server's state-machine responses for all "expected" interactions and only reveal proxy functionality for very-specific secret-bearing requests. We'll cover this in the design section below.

The methodology generalizes:

Banner grabbing. Probe TCP_ACCEPT with no initial bytes; observe whatever banner the server sends back. SSH servers send "SSH-2.0-OpenSSH_8.x" first; FTP servers send "220 (vsFTPd 3.0.x)". The banner is a fingerprint.
State-machine exploration. Probe each state with various inputs; map the response to each.
Probe replay. Capture probes and responses from one source; replay them from a different source and see if the responses differ. Helps detect proxies that use source-IP filtering.
Secret-dependent probing. Send a probe that requires a secret (e.g., a key) to elicit a specific response. The presence-or-absence of the response distinguishes "knows the secret" from "doesn't know."

Historical bridge probing lessons

Tor bridges have been the canonical active-probing target for over a decade. The history teaches several lessons:

2011: First active-probing campaigns. China's Great Firewall began testing suspicious-port-443 endpoints by initiating Tor handshakes. Bridges that responded Tor-like got blocked. Tor's response: develop pluggable transports (obfs2, then obfs3, obfs4) that hide the Tor handshake inside opaque bytes.

2012: Probing extends to obfs2/obfs3. As bridges adopted obfs2 and obfs3, GFW probes evolved to test for those protocols. Both were susceptible to passive identification (their handshakes had structural patterns) plus active confirmation (probes triggered specific responses).

2014: obfs4 with secret keys. obfs4 added a per-bridge secret key required for the handshake; without the key, the bridge wouldn't reveal Tor functionality. This was the first design specifically resistant to active probing — without the secret, the prober gets no useful response.

2014-2018: GFW continues probing. Even with obfs4, the GFW could observe that obfs4 bridges had distinctive behavioral patterns (no public-IP HTTP server response on port 443 from same address). The arms race continued.

2017+: HTTPS-mimicking transports (HTTPT, naïveproxy, REALITY). Newer designs tried to make bridges indistinguishable from real HTTPS servers — same TLS responses, same HTTP responses, same connection patterns. These work much better against active probing, at the cost of complexity.

Key lesson: probe resistance is harder than passive resistance. A bridge that's invisible passively can still be revealed actively. Active probing forces designers to think about every possible state-machine input and how to respond consistently.

The 2018 Dunna et al. paper "Analyzing China's Blocking of Unpublished Tor Bridges" documented the GFW's probing infrastructure: many probing IPs, distributed across China, generating ~100 probes per second of various protocols against suspect endpoints. The paper also showed that many "unpublished" bridges (private bridges not in any public list) were nevertheless discovered via traffic-pattern analysis followed by active confirmation.

Probe-resistant proxy design

The core probe-resistance design pattern:

Respond identically to ordinary clients and to invalid input. Don't have a distinguishable error mode for "wrong protocol" vs. "no input."

Reveal special behavior only to clients that prove out-of-band knowledge. A real client carries a secret (a key, a password, a specific URL parameter). The proxy serves an HTTPS response to anyone without the secret; only after seeing the secret does the proxy switch to proxy mode.

Mimic a real HTTPS server in every observable behavior. TLS handshake is real (matching a real cert from a real CA, ideally piggybacking on a real domain). HTTP responses match what a normal site at that domain would serve.

Avoid distinguishable failure modes. When invalid clients fail, they should fail in ways indistinguishable from how a real HTTPS server would fail.

Avoid distinguishable timing. A proxy that takes 200ms to respond to invalid input while a real server takes 50ms is fingerprintable. Constant-time response is hard but important.

A simplified probe-resistant proxy state machine in pseudocode:

on accept(connection):
  read(initial_bytes, timeout=30s)

  if looks_like_tls_clienthello(initial_bytes):
    cert, key = load_cert_for_my_domain()
    do_normal_tls_handshake(connection, cert, key)
    on tls_handshake_complete:
      first_request = read_http_request(connection, timeout=10s)

      if first_request_contains_secret(first_request):
        # This is a real proxy client.
        # Switch to proxy mode.
        switch_to_proxy_mode(connection)
        return

      else:
        # This is either a probe or an honest accidental visitor.
        # Serve the normal-looking website that lives at this domain.
        serve_legitimate_http_response(connection, first_request)
        close_connection_normally(connection)
        return

  else:
    # Initial bytes don't look like TLS at all (bare TCP, garbage, etc.)
    # Behave like a normal HTTPS server: close the connection cleanly.
    close_connection(connection)
    return

The properties:

Passive observers see TLS connections to a normal-looking website. The TLS fingerprint matches whatever cert library the server uses (often standardized).
Active probers without the secret get a normal-looking HTTPS website response. They can't distinguish this from a legitimate HTTPS server.
Active probers with the secret get the proxy behavior. But they had to know the secret out of band.
Probers sending random garbage get a clean connection close, just like a normal server would do for malformed input.

The HTTPT paper (Frolov and Wustrow, USENIX FOCI 2020) demonstrated this design pattern in a working system. The crucial design principle: probe-resistance requires that "indistinguishability from a normal server" holds across the full range of possible probes, not just the obvious ones.

HTTPT and the camouflage problem

HTTPT (Frolov and Wustrow's "Probe-Resistant HTTPS Proxy") is a clean reference implementation of the design pattern. The key contributions:

Co-hosted real website. HTTPT runs alongside a real web application on the same domain. The proxy and the website share TLS infrastructure; from outside, they look like one server. A probe that retrieves the homepage gets the real homepage; only specific URLs trigger proxy behavior.

Secret-bearing first request. The proxy is gated by a URL containing a per-user authentication token. Without the token, the request is treated as a regular HTTPS request to the website.

Realistic fallback behavior. When invalid requests come in, HTTPT serves them like the real website would, including realistic 404s, redirects, etc.

TLS fingerprint matching. HTTPT uses standard TLS libraries and configurations matching what real web servers run.

The result: a probe that doesn't know the secret cannot distinguish HTTPT from a legitimate website. Probing the IP shows a real domain with real content; deeper probing finds nothing unusual; only the secret URL reveals the proxy.

The cost: complexity. The proxy operator must run a real website (or co-host on a real one) to provide convincing fallback behavior. Token distribution is an out-of-band problem. Maintaining the website over time means actually maintaining it, not letting it rot.

For the modern censorship-evasion ecosystem, HTTPT-style designs are the gold standard. REALITY (covered in Track 6) takes a related but different approach: instead of running the proxy alongside a real website, REALITY hijacks a real website's TLS handshake and only switches to proxy behavior after authenticating the client. The hijacking has its own subtleties; we'll cover REALITY's specific tricks in xray-reality-vs-wireguard and the Track 6 modules.

Fingerprinting the probe-resistant system itself

Even with careful probe-resistant design, "ignore invalid input and serve a normal website" can become a fingerprint if the implementation differs subtly from what a real website would do. Specific concerns:

Timeout behavior. A real HTTP server has specific timeout values for slow client headers, slow request bodies, idle keep-alive connections. A proxy that uses different timeouts is identifiable by timing analysis.

Byte thresholds. A real server has specific limits on header sizes, URL lengths, request body sizes. A proxy with different limits responds differently to clients pushing those limits.

Co-hosting patterns. A real website on a domain typically serves many subdomains, has DNS records consistent with a real organization, has reverse DNS pointing back. A proxy on a fresh domain with no other associated infrastructure looks unusual.

Response sequencing. Real web pages contain JavaScript that makes additional requests (to CDNs, analytics, ads). A "real website" with no JavaScript activity may look unusual to behavior-based detection.

Content size and structure. A real website has specific content (pages, images, etc.) that doesn't change on every request. A proxy that serves dynamic placeholder content has unusual size/timing patterns.

Server certificate. A real domain has a certificate from one of a few CAs, issued for a typical period, with typical metadata. A certificate that's freshly issued, from an unusual CA, or for an unusual domain length stands out.

The principle: every detail of "looking like a normal website" matters. Designers who match TLS configuration but not the website content, or who co-host with a real site but use different timeouts, leave fingerprints.

Measurement ethics and operator risk

Active probing as a research technique raises ethical questions. When a researcher probes "all IPs in /16 X to find proxies," the probes:

Generate traffic to non-consenting destinations.
May trigger rate-limiting or blocklisting at the targets.
May be misinterpreted as scanning attacks by network operators.
Can identify private services the operators preferred to keep private.

The Internet measurement community has developed norms:

Minimization. Probe only as much as the research question requires. Don't scan when you don't need to.
Disclosure. Operate from IPs with WHOIS records identifying the research; provide an opt-out mechanism (a webpage at the source IP explaining the research and how to be excluded).
Limit damage. Use payloads that don't trigger destructive responses; don't authenticate against unknown services; don't consume substantial resources at the target.
Aggregate reporting. Report findings in ways that don't deanonymize individual operators.

These norms aren't legally binding but are widely respected by academic measurement researchers. Network operators' tolerance for measurement varies; some block known measurement infrastructure proactively; others tolerate disclosed research.

The asymmetry: nation-state probers (the GFW, etc.) don't follow these norms. They probe at scale with no opt-out, no disclosure, no minimization. The legal and political constraints that limit research probing don't constrain the operators with the strongest motivation to deploy probing infrastructure.

For proxy operators: assume your endpoints will be probed by sophisticated, well-resourced adversaries. Design as if probing is constant; treat any reduction in probing as a temporary bonus, not a permanent state.

Hands-on exercise

Model a simple handshake state machine.

Tools: plain text or paper. Runtime: 15 minutes.

Draw the state transitions for a benign HTTPS server:

[INITIAL]
   |
   v
[TCP_ACCEPTED] (listening for first byte)
   |   timeout (30s no input) → [CLOSED]
   |   bytes don't look like TLS → [CLOSED]
   |   bytes look like TLS ClientHello
   v
[TLS_HANDSHAKE_IN_PROGRESS]
   |   handshake fails → [TLS_ALERT_SENT] → [CLOSED]
   |   handshake succeeds
   v
[TLS_ESTABLISHED]
   |   timeout (10s no HTTP) → [CLOSED]
   |   bytes don't look like HTTP → [TLS_ALERT_SENT] → [CLOSED]
   |   HTTP request received
   v
[HTTP_REQUEST_PROCESSED]
   |   serve response or 4xx/5xx
   v
[KEEP_ALIVE_OR_CLOSE]
   |   timeout → [CLOSED]
   |   another request → [HTTP_REQUEST_PROCESSED]
   |   connection: close → [CLOSED]

Now mark where a probe-resistant proxy might branch on a secret. After [HTTP_REQUEST_PROCESSED]:

[HTTP_REQUEST_PROCESSED]
   |   request URL contains valid auth token → [PROXY_MODE_ACTIVATED]
   |   request URL is normal → continue normal HTTPS flow
   v
[normal flow] or [PROXY_MODE_ACTIVATED]

The branch happens after authenticating the request. Without the token, the state machine looks identical to a normal HTTPS server. With the token, it switches to proxy behavior.

Stretch: list how timeout behavior or partial reads could leak the hidden branch:

If the proxy mode handles connections differently after handshake (longer keep-alive, different timeouts), that's observable.
If the proxy returns a 200 OK for the secret URL but generates a 404 for similar-but-not-quite-secret URLs, the difference in response timing might leak.
If the proxy buffers the entire request before deciding (but a normal server processes streaming), the buffering delay is observable.

Read an active-probing case study critically.

Tools: browser (read the Dunna et al. 2018 paper). Runtime: 15 minutes.

Read Analyzing China's Blocking of Unpublished Tor Bridges. Note specifically:

What trigger caused probes to be sent? (Suspicion from passive classification of traffic patterns.)
What probes did the researchers observe? (TLS handshakes, Tor protocol attempts, various pluggable-transport handshakes.)
How did the researchers infer scanner behavior? (Setting up honeypots, watching what came in, correlating across honeypots.)
What was the response time from suspicious-traffic-observation to active-probe? (Hours to days for some bridges; faster for others.)

Write a one-paragraph summary that distinguishes "what the researchers observed" from "what the researchers concluded."

Common misconceptions and traps

"Active probing is just port scanning." Port scanning asks whether something is listening on a port. Active probing asks what protocol logic lives behind a suspicious endpoint and how it responds to specific inputs. Port scanning is binary; active probing is exploratory.

"Silently dropping all unexpected input is always safe." Silence itself can become a signature if benign servers usually behave differently. A real HTTPS server typically sends a TLS Alert for malformed input, not silent close. A proxy that silently drops everything stands out.

"Probe resistance means perfect invisibility." It usually means shifting the detector's job from easy confirmation to harder statistical inference. Even probe-resistant proxies can be identified by behavioral patterns (no real-website JS activity, unusual co-hosting, certificate patterns); the question is the cost of identification.

"Once a shared secret gates access, the problem is solved." Timing, fallback, capacity limits, and co-hosting behavior can still betray the service. The secret gates access to the proxy functionality but doesn't prevent the prober from observing how the gate behaves.

"Any Internet-scale probing result is ethically acceptable if the payload is small." Measurement ethics still require minimization, disclosure, opt-out, and damage-limit considerations. Academic measurement researchers follow norms; nation-state operators generally don't.

"My proxy is undetectable because I haven't seen any probes yet." Absence of evidence isn't evidence of absence. Your endpoint may be on a list of suspects that haven't yet been actively probed, or probes may be happening below your detection threshold. Operate as if probing is constant.

"Active probing requires huge infrastructure." It requires some infrastructure, but a single probing host can probe thousands of endpoints per day. Nation-state probers have orders of magnitude more capacity, but small-scale probing (e.g., a researcher confirming whether a specific endpoint runs a specific protocol) is cheap.

"My obfs4 bridge is safe because it requires a secret." obfs4 with a secret key is much harder to confirm via probing than older bridge transports, but it's still distinguishable by behavioral patterns (no plausible cover-traffic on the same IP, unusual port usage, etc.). The secret raises the bar; it doesn't eliminate the threat.

Wrapping up

Active probing turns passive suspicion into active confirmation. It's the second stage in a two-stage detection pipeline: cheap passive classification narrows the suspect list; precise active probing confirms the survivors. The arms race between probing-evolution and probe-resistance has shaped pluggable transports, REALITY, naïveproxy, and HTTPT-style designs over the past decade.

The probe-resistant design pattern is straightforward to state — respond identically to ordinary clients and invalid input; reveal special behavior only to clients with out-of-band knowledge — and difficult to implement comprehensively. Every detail (TLS fingerprint, timing, co-hosting behavior, certificate metadata, response size, error handling) matters. Production probe-resistant proxies usually co-host with real websites to provide convincing fallback behavior.

Measurement ethics constrain academic probing research; they don't constrain nation-state operators. Proxy operators should assume probing is continuous and design for it.

The next module (tls-fingerprinting-in-production — coming soon) goes into the production reality of TLS fingerprinting at scale: how CDNs and large platforms classify clients by TLS fingerprint, how the JA3/JA4 ecosystem actually works in practice, and why TLS impersonation tools (uTLS, curl-impersonate) raise the cost of fingerprint-based detection while not eliminating it. Track 6 covers the specific defensive constructions (REALITY, naïveproxy) that compose probe resistance with other techniques.