RouteHardenHire us
Back to Detection
Detection · Part 3 of 6·Anonymity Engineering··11 min read·advanced

TLS fingerprinting in production

ClientHello structure, JA3 versus JA4, drift, ambiguity, and how production detectors really use TLS fingerprints.

The previous two modules established the broader detection picture: passive classification followed by active probing, with TLS fingerprinting one of the most important features in both. This module looks specifically at how TLS fingerprinting works in production at scale — what CDNs and large platforms do with TLS fingerprints, why JA3 was insufficient and JA4 was designed to replace it, why production analysts treat fingerprints as probabilistic evidence rather than identity proof, and what GREASE and adversarial impersonation do to the entire ecosystem.

The deeper architectural treatment lives at ja3-ja4-tls-fingerprinting. This module focuses on the production angle: how working classifiers actually use these fingerprints, the operational constraints that shape their use, and why a TLS fingerprint match is a starting point for analysis rather than a conclusion.

Prerequisites

Learning objectives

  1. Explain what TLS ClientHello structure is captured by JA3 and JA4 fingerprints, and why the choices differ.
  2. Distinguish how production analysts use fingerprints — for client identification, anomaly detection, bot mitigation, censorship classification.
  3. Evaluate why a TLS fingerprint should be treated as probabilistic evidence rather than deterministic identity.
  4. Describe what GREASE, JA4's normalization, and uTLS-based impersonation do to the fingerprint ecosystem and how production systems adapt.

What's in a ClientHello

A TLS ClientHello is the first message a client sends in a TLS handshake. It contains:

  • TLS version (legacy field; TLS 1.3 negotiates via supported_versions extension).
  • Cipher suite list, ordered by client preference.
  • Compression methods (almost always [0]).
  • Extensions, ordered, each with type code and content. Common: server_name (SNI), supported_versions, supported_groups, signature_algorithms, ALPN, key_share (TLS 1.3), psk_key_exchange_modes, status_request, etc.

The ClientHello is unencrypted (TLS 1.2) or partially encrypted (TLS 1.3 hides some extension contents but not the structure). Anyone observing the handshake sees the message structure.

The fingerprintable observation: different TLS libraries make different choices about which extensions to include, in what order, with what cipher suites, with what supported groups. These choices are stable per library and version — Firefox 128 always sends the same ClientHello structure (modulo SNI and key shares which vary per connection).

JA3 and its limitations

JA3 (Salesforce, 2017) was the first widely-adopted standardized fingerprint. The construction:

JA3 = MD5(version,cipher_list,extension_list,supported_groups,ec_point_formats)

Each list is the comma-separated decimal values of the relevant fields, in order. The MD5 hash is the fingerprint.

JA3 worked well in the 2017-2020 timeframe. Then several developments undermined it:

GREASE (2018+). Chrome added GREASE — Generate Random Extensions And Sustain Extensibility (RFC 8701). The idea: include reserved-value extensions, ciphers, and groups in ClientHellos to ensure servers actually ignore unknown values. The reserved values rotate per connection, so the JA3 changes per connection even from the same client.

Extension shuffling (2023+). Chrome started randomizing the order of extensions in ClientHellos. JA3 hashes extension order, so randomization breaks the JA3 — every connection has a different hash even from the same browser.

MD5 weaknesses. Theoretical concern: MD5 is broken cryptographically. Practical concern: MD5 collisions can be constructed for specific attacks if needed.

Result: a JA3-based fingerprinting database collected in 2020 may not match 2026 traffic from the same browsers. The library hasn't changed; the fingerprint observation has.

JA4 and the redesign

JA4 (FoxIO, 2023) was designed to address JA3's limitations. Key changes:

Normalization for shuffling. JA4 sorts extension values before hashing, producing a stable fingerprint despite GREASE-driven or browser-driven extension shuffling. Two ClientHellos from the same Chrome version with shuffled extensions hash to the same JA4.

SHA-256 instead of MD5. Stronger hash; harder to forge collisions.

Layered components. A JA4 fingerprint is a structured string with multiple components, not a single hash. Components include TLS version, ciphers hash, extensions hash, ALPN. Analysts can reason about each component independently.

Family of fingerprints. JA4+ extends to other protocols:

  • JA4 = TLS ClientHello
  • JA4S = TLS ServerHello
  • JA4H = HTTP client request
  • JA4L = TLS server latency
  • JA4X = certificate chain
  • JA4T = TCP fingerprint

The JA4 design is well-documented at github.com/FoxIO-LLC/ja4. Production systems using JA4 (Cloudflare, several large CDNs, security vendors) have largely migrated from JA3.

A JA4 string looks like: t13d1516h2_8daaf6152771_b1ff8ab2d16f. The components encode TLS version (t13), client direction marker (d), cipher count + extension count, ALPN (h2), then two SHA-256-truncated hashes (cipher list + extension list). Analysts can read this as "TLS 1.3 client with these characteristics" rather than as an opaque hash.

How production analysts actually use TLS fingerprints

The textbook description is "fingerprint identifies client." The production reality is more nuanced.

Bot detection. A common use: distinguish browser traffic from bot traffic. Real browsers have known JA4 fingerprints (Chrome, Firefox, Safari, Edge — each with versions). Bots often use HTTP libraries (Go's net/http, Python's requests, Node's axios) with distinctive JA4 fingerprints. Cloudflare and similar services classify incoming requests partly by JA4.

Client identification. A site can identify which browser version is hitting them. Useful for compatibility-tracking ("Firefox 128 users are still showing up"), for serving optimized assets, for analytics.

Anomaly detection. A user account that normally connects from JA4 X suddenly connecting from JA4 Y is suspicious — possible account takeover or session hijacking. The fingerprint is one signal among many.

Censorship. Nation-state firewalls use JA4-style fingerprints to identify circumvention tools. Tor has a recognizable JA4; obfs4 has a different one; sing-box has one; uTLS-using clients impersonate browser JA4s.

Threat intelligence. Specific JA4 hashes get associated with known malware families (Cobalt Strike, specific ransomware, various RATs). A connection from a suspicious source IP with a known-bad JA4 is high-confidence malicious.

API abuse mitigation. API services see many client implementations; legitimate ones have a few known JA4s, abuse traffic often uses different JA4s. Combined with rate limiting, JA4 helps identify abusive accounts.

Why fingerprints are probabilistic, not deterministic

Several reasons production analysts treat TLS fingerprints as evidence rather than identity:

Many clients share libraries. Hundreds of applications use the Go standard library's net/http. They share the same JA4 fingerprint. A "Go net/http" JA4 doesn't tell you which application is connecting — it tells you the underlying library.

Updates change fingerprints. Each browser release potentially changes the ClientHello structure. A new fingerprint may indicate a malicious tool, or it may indicate a legitimate browser version that just rolled out.

Middleboxes can alter observations. TLS-inspecting proxies (corporate MITM proxies, CGNAT-like devices that re-establish TLS, inspecting CDN edges) replace the original ClientHello with their own. The observer sees the proxy's fingerprint, not the user's.

uTLS-based impersonation. Tools like uTLS in Go let an application produce a ClientHello that exactly matches Chrome's structure. The JA4 of the impersonating tool matches Chrome's. Without other features, the observer can't distinguish them.

Custom configuration. Power users with custom TLS settings can produce unusual fingerprints. The fingerprint is unusual, but the user is benign.

Test environments. Developers running tests with non-standard TLS configurations (e.g., Burp Suite, mitmproxy) generate unusual fingerprints that aren't malicious.

The right mental model: a JA4 fingerprint is one feature of one connection. Production analysts combine it with other signals — IP reputation, request rate, request content, account history, behavioral patterns — to make decisions. A "matches Cobalt Strike JA4" is high signal; a "doesn't match any known browser JA4" is low signal because it could be many things.

What GREASE means and why it matters

GREASE deserves a dedicated discussion. The mechanism: Chrome (and other browsers, increasingly) intentionally include reserved/random values in ClientHello fields to ensure protocol robustness:

  • Random GREASE cipher suites in the cipher list.
  • Random GREASE extension types in the extension list.
  • Random GREASE supported groups in the curves list.

The values rotate per connection. The intent is anti-ossification: by always sending unknown values, the protocol forces middleboxes to actually ignore unknown values rather than hardcoding which values are "known." If middleboxes break when they see unknown values, GREASE causes the breakage to manifest immediately, allowing fix; without GREASE, the breakage might be invisible until a future protocol revision adds a real new value that gets blocked.

For fingerprinting:

  • JA3 breaks under GREASE. Random per-connection values produce different hashes for the same client.
  • JA4 normalizes GREASE. The JA4 spec recognizes GREASE values and sorts them out before hashing. Two JA4s from the same Chrome version (one with GREASE values, one notionally without) are identical.

GREASE is a real engineering complication for fingerprinting; it's not adversarial in intent but does require fingerprint schemes to handle it. JA4's normalization is the standard solution.

Hands-on exercise

Extract ClientHello fields with tshark.

Tools: tshark, a real network connection. Runtime: 15 minutes.

Capture a TLS handshake:

sudo tshark -i any -w /tmp/tls.pcap "host example.com and port 443" &
TSHARK_PID=$!
curl -s https://example.com > /dev/null
kill $TSHARK_PID
wait $TSHARK_PID 2>/dev/null

Decode the ClientHello:

tshark -r /tmp/tls.pcap -V -O tls 2>/dev/null | grep -A 100 "Client Hello" | head -100

Identify:

  • TLS version (look for Version and supported_versions extension)
  • Cipher suite list (look for Cipher Suites)
  • Extensions (look for Extension:)
  • ALPN values (look for ALPN Next Protocol)

For each, ask: would a different curl/library produce a different value? (Almost certainly yes — different libraries pick different defaults.)

Stretch: use the ja4 Python tool (pip install ja4) to compute the JA4 of your ClientHello, and compare against the published Chrome/Firefox JA4 references.

Analyst interpretation table.

Tools: notes. Runtime: 5 minutes.

For each scenario, write what an analyst should conclude:

ObservationProbable interpretationConfidence
JA4 matches Chrome 130Likely a Chrome 130 clientMedium-high
JA4 matches Cobalt Strike default profileLikely malicious (or pen-tester)Medium-high
JA4 doesn't match any known fingerprintUnknown client; could be new tool or new browser versionLow
JA4 matches Chrome but JA4H doesn'tPossibly impersonator (uTLS-based) or unusual deploymentMedium
Same source IP shows multiple unrelated JA4sMultiple clients/users behind NAT, or tool rotationN/A
Source IP matches abuse pattern AND JA4 unusualHigher-confidence maliciousHigh

The exercise: each row should be plausibly read by a Tier-1 analyst at a CDN or security operations center. The fingerprint matters but doesn't decide alone; combination with other signals is what produces actionable confidence.

Common misconceptions and traps

"A JA4 match identifies the user." It identifies the TLS-library-and-version, not the user. Many users share the same browser; many applications share the same TLS library.

"GREASE makes fingerprinting impossible." It makes JA3 unreliable. JA4's normalization handles GREASE; modern fingerprinting systems use JA4 or equivalent normalized schemes.

"uTLS impersonation makes fingerprint detection useless." It makes single-fingerprint detection less reliable. Combined with TCP fingerprinting (which uTLS doesn't address), HTTP/2 fingerprinting (often subtly different), and behavioral analysis, impersonation can still be detected.

"My corporate MITM doesn't affect my JA4." It absolutely does. Any TLS-terminating intermediary (corporate proxy, CGNAT-like device, inspecting CDN) replaces your ClientHello with theirs. The observer sees the intermediary's JA4, not yours.

"A new fingerprint is always suspicious." Browser updates produce new fingerprints regularly. Treating new fingerprints as immediately malicious produces enormous false-positive rates. The right pattern: validate (does this fingerprint correspond to a known browser version?) before acting.

Wrapping up

TLS fingerprinting in production is a probabilistic signal that operators combine with many other signals to make decisions. JA4 has largely replaced JA3 because of the latter's brittleness under GREASE and extension shuffling; the JA4 ecosystem (JA4S, JA4H, JA4X, JA4T) provides layered fingerprints across the TLS, HTTP, certificate, and TCP layers.

Production analysts don't treat a single fingerprint as identity proof; they treat it as evidence weighted alongside IP reputation, request patterns, account behavior, and other features. The right mental model is "this observation contributes evidence to a decision," not "this observation is a decision."

uTLS-style impersonation tools complicate the picture but don't defeat it. Fingerprint impersonation at the TLS layer plus consistent behavior at the TCP and HTTP/2 layers is hard to engineer comprehensively; mismatched profiles (TLS that says Chrome but TCP that says Linux) are themselves identifying.

The next module (encrypted-traffic-classification-with-ml — coming soon) goes broader into ML-based traffic classification, where TLS fingerprints are one feature among many that classifiers learn from.

Further reading