sing-box and Xray architecture
How sing-box and Xray actually work: inbounds, outbounds, routing, DNS, transport modules, and why these systems are frameworks, not one protocol.
The most common question about sing-box and Xray is the wrong question. People ask "what protocol is sing-box?" or "is Xray better than sing-box?" or "should I use VLESS or Trojan?" and end up confused, because none of those questions has a clean answer. The protocol question is wrong because sing-box and Xray are not protocols — they're programmable transport-routing platforms that can speak many protocols. The "better" question is wrong because the comparison depends entirely on which transports, which routing rules, which DNS strategy, and which operational model your deployment needs. The VLESS-vs-Trojan question is wrong because both are modules inside the same framework, and the choice between them is a tactical decision inside a much larger architectural one.
This module is the architectural treatment. We're going to look at sing-box and Xray as what they actually are: single-process engines that compose inbound traffic acceptors, outbound transport modules, a routing dispatcher, and a DNS subsystem into a flexible programmable proxy. The protocol modules (VLESS, Trojan, Hysteria2, TUIC, Reality, Shadowsocks, WireGuard) are pluggable pieces inside that framework, not the framework itself. Once you see the framework structure clearly, the protocol choices become much easier to reason about.
This is the architectural piece. The practical setup guide already lives at sing-box-config-reference; the censorship-evasion comparison lives at xray-reality-vs-wireguard. Here we'll focus on what's actually inside the box: the dataflow, the subsystem boundaries, the routing semantics, and why these systems are simultaneously the most powerful and the most cognitively expensive options in the proxy ecosystem.
Prerequisites
udp-the-simplest-transport— many of the modern transports (Hysteria2, TUIC, QUIC-based VLESS) are UDP/QUIC underneath.stream-ciphers-and-aead-construction— for understanding why ChaCha20-Poly1305 and AES-GCM dominate these protocols' data planes.noise-protocol-framework— Reality and several modern transports borrow Noise-style key-exchange patterns.wireguard-from-first-principles— both sing-box and Xray include WireGuard as an outbound transport module; the contrast with their other transports is illuminating.
Learning objectives
- Explain sing-box and Xray as programmable transport-routing platforms rather than as one single protocol.
- Distinguish inbound, outbound, dispatcher, routing, DNS, and transport-layer concerns inside these systems.
- Explain how protocol modules (VLESS, Trojan, Hysteria2, TUIC, Reality, WireGuard, Shadowsocks) fit into a larger framework rather than replacing it.
- Diagnose why these ecosystems are powerful but cognitively expensive to operate, and recognize the operator-tax failure modes.
These are transport toolkits, not one protocol
The first thing to internalize: there is no "sing-box protocol." There is no "Xray protocol." Both are processes that bundle a programmable runtime, a dispatcher, a routing engine, a DNS subsystem, and a library of transport modules. The transport modules are individual protocol implementations — VLESS over TCP, Trojan over TLS, Shadowsocks-2022, Hysteria2 over QUIC, TUIC, WireGuard, plain SOCKS5, plain HTTP CONNECT — and the framework's job is to glue them together.
When someone says "I'm running Xray with VLESS-Reality," what they actually mean is: "I'm running an Xray process. I've configured an inbound that accepts VLESS connections from clients. The connections arrive over a TLS-disguised transport called Reality. The Xray router decides what to do with the resulting decrypted traffic — typically forwarding it through a freedom outbound to the public internet, or maybe through a chain of further outbounds for multi-hop routing." The framework's flexibility is what makes "VLESS-Reality" a configuration rather than a separate piece of software.
This matters operationally because the same process can speak many protocols simultaneously. A single sing-box instance can:
- Accept VLESS-Reality from China-based clients on TCP/443 (camouflaged as HTTPS).
- Accept Hysteria2 from clients on lossy mobile networks on UDP/8443.
- Accept SOCKS5 from local browser clients on 127.0.0.1:1080.
- Accept WireGuard from infrastructure peers on UDP/51820.
- Forward most traffic through a
directoutbound (no proxying) for domestic destinations. - Forward overseas-destined traffic through a chained
vless-realityoutbound to a foreign sing-box instance. - Resolve DNS through a
dohoutbound for sensitive lookups and a local resolver for everything else.
That's one process running six transport modules with a routing layer that decides which traffic goes where. Calling this collection "the sing-box protocol" misses what's actually happening.
The conceptual unit you need is the framework, with the protocol modules as plug-ins. When designing a deployment, decide on inbounds (what comes in), outbounds (what goes out), and routing rules (how each in-flight connection maps from inbound to outbound). The protocol-module choices follow from those decisions.
Core runtime architecture
Both sing-box and Xray have very similar core architectures, despite different config syntax and module naming. The dataflow inside the process:
┌──────────────────────────────────────┐
│ sing-box / Xray │
│ process │
│ │
client traffic ──►│ inbound ─► dispatcher ─► │──► outbound ──► remote
(TCP/UDP) │ (proto) (routing rules) │ (proto)
│ │ │
│ ▼ │
│ DNS module │
│ (resolves domains │
│ for routing decisions) │
│ │
└──────────────────────────────────────┘
The five core subsystems:
Inbound modules accept traffic into the process. Each inbound is one configured listener: bind to an interface and port, speak a particular protocol (SOCKS5, HTTP, TUN, VLESS, Trojan, Shadowsocks, etc.), and hand the resulting decoded traffic to the dispatcher. A single process can run many inbounds simultaneously. The TUN inbound is special — instead of opening a TCP listener, it creates a TUN device and reads raw IP packets from it, which lets the framework function as a system-wide proxy without per-application configuration.
Outbound modules send traffic out of the process. Each outbound is one configured sender: speak a particular protocol (the same protocol library as inbounds, since most protocols are bidirectional), connect to a remote endpoint or peer, and ship the bytes the dispatcher hands them. Special outbounds: direct sends traffic without proxying (the local network handles it normally), block drops traffic, dns-out sends DNS-shaped traffic to the DNS module specifically.
The dispatcher is the routing engine. For every connection that comes in through any inbound, the dispatcher evaluates the routing rules in order and picks an outbound to send it through. Rules can match on destination IP, destination port, destination domain (after DNS resolution if needed), source IP, inbound name, protocol type, GeoIP database lookups, and more. The dispatcher is where the policy lives.
The DNS module is a first-class subsystem, not just a helper library. It manages multiple DNS servers (often: a fast local one and a secure remote one), enforces which-server-resolves-which-domain rules, supports DoH/DoT/DoQ, and crucially feeds back into the dispatcher — many routing rules need to know what IP a domain resolves to before deciding which outbound to use, and the DNS module is what the dispatcher calls to find out.
The observability subsystem (logging, statistics, API endpoints) tracks connection counts, bytes per outbound, errors, latency, and rule evaluation. In sing-box this is exposed via the Clash-API-compatible interface (so sing-box can be controlled by Clash-ecosystem dashboards). In Xray it's exposed via a gRPC API. Both let operators monitor traffic patterns and switch outbounds dynamically.
The whole thing runs as a single OS process. There's no daemon-per-protocol; one process holds all of it. This is part of what makes the framework powerful (everything can interact in-process, no IPC overhead) and part of what makes it cognitively heavy (one config file controls a lot of moving parts).
Protocol modules versus transport modules
Inside the framework, there's a useful distinction between two kinds of modules that often get confused:
Application-facing protocols are how local applications connect to the process. These are the protocols your browser, your terminal, or your operating system speaks to the proxy:
socks— SOCKS5 (occasionally SOCKS4); the most common application-facing protocol.http— HTTP CONNECT proxy.mixed— sing-box's combined SOCKS+HTTP listener (auto-detect on port).tun— virtual network interface for system-wide capture.redirect/tproxy— Linux iptables-based traffic capture.
Remote-transport protocols are how the process talks to other proxy nodes — the actual VPN-equivalent transports. These are what you think of as "the protocol":
vless— VLESS, a stripped-down VMess descendant; commonly used over TLS or Reality.trojan— Trojan-style TLS-tunneled proxy; designed to look exactly like HTTPS.shadowsocks/shadowsocks-2022— symmetric-encryption obfuscated proxy.hysteria2— QUIC-based with custom congestion control optimized for lossy paths.tuic— QUIC-based with multipath and 0-RTT options.wireguard— WireGuard, exactly as discussed inwireguard-from-first-principles.naive— Chromium-based naïveproxy, looking like Chrome HTTPS traffic.direct/freedom— no proxy, traffic egresses normally (used as an outbound).block— drop traffic (used as an outbound).
The distinction matters because a typical deployment uses one set of inbound protocols (SOCKS for browser, TUN for system-wide) and a different set of outbound protocols (VLESS-Reality for the foreign tunnel, direct for domestic). It's perfectly normal to have inbound SOCKS5 routed through outbound VLESS-Reality routed through a chained outbound WireGuard — three protocols stacked, each doing different work.
There's also a third category worth mentioning:
Transport-layer wrappers are how a protocol module sits on the wire. VLESS itself is a small framing protocol; what carries VLESS bytes can be plain TCP, TLS, WebSocket-over-TLS, gRPC, QUIC, or HTTP/2. The transport-layer choice affects what the connection looks like to network observers, what censorship middleboxes will tolerate, and what performance characteristics emerge. Reality is a particularly interesting transport-layer wrapper: it's TLS that uses someone else's TLS certificate (typically a real big-name HTTPS site's cert) to make the tunnel indistinguishable from an honest HTTPS connection, even under active probing. (See xray-reality-vs-wireguard for the deeper Reality writeup.)
The composition is inbound-protocol → routing → outbound-protocol → transport-wrapper, and the framework's flexibility comes from being able to mix and match all four layers per-connection.
Routing as the control plane inside the process
The dispatcher is where most operational complexity lives. A modest deployment has maybe 5-15 routing rules; a sophisticated one has 100+, with rule-set-based pattern matching, GeoIP lookups, and conditional logic.
A typical sing-box routing config (simplified):
{
"route": {
"rules": [
{
"type": "logical",
"mode": "or",
"rules": [
{ "protocol": "dns", "outbound": "dns-out" },
{ "port": 53, "outbound": "dns-out" }
]
},
{ "ip_is_private": true, "outbound": "direct" },
{ "rule_set": "geoip-cn", "outbound": "direct" },
{ "rule_set": "geosite-cn", "outbound": "direct" },
{ "rule_set": "geosite-category-ads-all", "outbound": "block" },
{ "domain_suffix": [".gov.cn", ".edu.cn"], "outbound": "direct" },
{ "domain_keyword": ["google", "youtube", "telegram"], "outbound": "vless-reality-out" }
],
"final": "vless-reality-out",
"auto_detect_interface": true
}
}
Reading this top to bottom: any DNS-shaped traffic goes to the DNS-out outbound. Private-address traffic egresses directly. China-IP destinations go direct. China-domain destinations go direct. Ads get blocked. Government and education domains stay direct. Anything matching the keyword list (Google, YouTube, Telegram) goes through the VLESS-Reality outbound. Everything else (final) also goes through VLESS-Reality.
Rules evaluate top to bottom, and the first match wins. Each match selects an outbound. The default — when nothing matches — is the final outbound.
What looks simple here gets complicated quickly:
- Rule sets.
geoip-cnandgeosite-cnare downloaded rule databases (geoip.db, geosite.db, or sing-box-formatsrsfiles), updated periodically. They contain tens of thousands of patterns. Rule-set selection is a per-deployment decision: V2Ray'sgeositeis the historical canonical source, but loyalsoldier maintains a more aggressive fork, and the Loyal-to-You-Family forks have different goals. - Selectors. A
selectoroutbound presents a list of candidate outbounds and lets the operator (or an external API) choose which one is active. This is how you build "click to switch nodes" UIs. - URL test. A
urltestoutbound automatically picks the lowest-latency candidate from a list, periodically re-testing. This is how you build "auto-fallover" deployments where if your primary node goes down, traffic seamlessly moves to a backup. - Logical rules. AND/OR/NOT compositions of simpler rules. The example above has a logical OR for "DNS protocol or port 53," which catches both standard and non-standard DNS traffic.
- Inbound-tagged routing. Rules can match on which inbound the traffic came in through, letting one process serve different clients with different policies. ("Traffic from the local SOCKS inbound goes through proxy X; traffic from the TUN inbound goes through proxy Y.")
- Process-name matching. On platforms that expose it (Linux with cgroups, macOS, Windows), rules can match on the local process that originated the connection. ("Firefox traffic goes through proxy A; Chrome traffic goes through proxy B.") Implementation requires elevated privileges; details vary by platform.
The complexity is the point. Anyone who's used a single-rule "everything-through-the-tunnel" VPN and then needed to whitelist their bank's website understands why fine-grained routing matters. The cost is that misconfiguring routing can produce subtle failures: traffic that's "supposed to" go through the tunnel actually goes direct because a rule set matched first; traffic that's supposed to be blocked accidentally egresses; routing loops if outbound A's destination happens to match a rule that re-routes through outbound A. Debugging routing requires careful log reading.
DNS as a first-class transport dependency
DNS is where many sing-box and Xray deployments mysteriously break. The architecture treats DNS as a first-class subsystem because routing decisions often depend on what a domain resolves to:
{
"dns": {
"servers": [
{ "tag": "remote-doh", "address": "https://1.1.1.1/dns-query", "detour": "vless-reality-out" },
{ "tag": "local-dns", "address": "223.5.5.5", "detour": "direct" },
{ "tag": "fakeip", "address": "fakeip" }
],
"rules": [
{ "rule_set": "geosite-cn", "server": "local-dns" },
{ "outbound": "vless-reality-out", "server": "remote-doh" }
],
"final": "remote-doh",
"strategy": "ipv4_only",
"fakeip": {
"enabled": true,
"inet4_range": "198.18.0.0/15"
}
}
}
Reading this: there are three DNS servers configured. remote-doh is Cloudflare's DoH endpoint, reached through the VLESS-Reality outbound (so the DoH query itself is tunneled and not visible to your local network). local-dns is China's AliDNS at 223.5.5.5, reached directly. fakeip is the fake-DNS subsystem (more on this below).
DNS rules: China-listed domains resolve via local-dns (fast, geographically appropriate). Anything that's going to use the VLESS-Reality outbound resolves via remote-doh (so the DNS query and the actual traffic both go through the tunnel — important for both privacy and consistency). Everything else (final) uses remote-doh.
The reasoning matters. If you resolve youtube.com via local-dns, you might get an IP that the local network knows isn't reachable (because YouTube is blocked locally), and your client will fail to connect even though the routing rule said to use the tunnel. By forcing youtube.com resolution to go through the tunnel's DoH, the IP returned is one the tunnel can actually reach.
Fake-IP is the cleverest piece of the DNS architecture. Normally, when an application wants to connect to youtube.com, the resolver returns a real IP, the application opens a TCP connection to that IP, and the routing layer decides where to send it based on the destination IP. The problem: by the time the routing layer sees the connection, all it has is the IP — the original domain is gone. If your routing rules are domain-based, this breaks.
Fake-IP fixes this by lying. When the application asks for youtube.com, the resolver returns a fake IP from the 198.18.0.0/15 range (a documentation-only range that won't conflict with anything real). The application opens a TCP connection to that fake IP. The TUN inbound captures the connection, looks up the fake IP in its mapping table, recovers the original youtube.com domain, and uses the domain for routing decisions. Then the chosen outbound sends the actual traffic to the real youtube.com, which it resolves via its own configured DNS path.
The result: routing decisions can use domains even though the operating system thinks it's connecting to IPs. Without fake-IP, only the first DNS lookup sees the domain, and after that everything is IP-based; with fake-IP, the domain context is preserved across the whole connection.
The catch: fake-IP only works with the TUN inbound (which can intercept the artificial-IP connections). If you're using SOCKS5 inbound, applications need to send the destination as a domain (which SOCKS5 supports natively — most apps use SOCKS5h mode for exactly this reason).
Xray architecture specifics
Xray's working-mode documentation lays out the architecture in terms of a request/response flow inside the process:
- Inbound proxy (e.g., SOCKS, VLESS server) receives a connection and decodes it into a "request" (destination address + initial bytes).
- DNS module is consulted if domain-to-IP resolution is needed for routing.
- Router evaluates rules in order, picks an outbound.
- Outbound proxy (e.g., VLESS client, freedom, blackhole) sends the request to the chosen destination using the selected protocol and transport.
- Bidirectional bytes flow until the connection closes.
The configuration model is a JSON object with top-level fields inbounds, outbounds, routing, dns, policy, log, transport, stats, api. Each inbound and outbound has a tag for cross-referencing, a protocol (the module name), and protocol-specific settings. Transport configuration (TLS, Reality, WebSocket, gRPC) lives in a streamSettings block on each inbound/outbound.
The Reality transport gets special architectural treatment. Reality is implemented as a streamSettings.security: "reality" configuration — the protocol module (e.g., VLESS) doesn't change, but the transport layer beneath it is replaced with Reality's TLS-camouflage handshake. This is a clean architectural choice: the protocol layer doesn't need to know how the transport layer is camouflaging itself, and the transport layer doesn't need to know what protocol it's carrying.
Xray's routing.balancer and routing.strategy give the same behavioral capabilities as sing-box's selector and urltest — pick from a list of outbounds, or pick the lowest-latency one. The naming differs but the concept is identical.
What Xray does differently from sing-box: it leans more heavily on the legacy V2Ray/V2Fly config conventions (the project is a fork lineage from V2Ray), it has more explicit support for the Reality transport (which originated in the Xray project), and its config-language ergonomics tend toward "explicit and verbose" rather than "compact and composable."
sing-box architecture specifics
sing-box was built later and learned from V2Ray's and Xray's design history. The architecture is conceptually similar but the configuration model is more uniform:
- Every inbound and outbound has the same set of common fields (
tag,type,listen,listen_port). - Type-specific fields are nested, so a Hysteria2 outbound's
up_mbps/down_mbpsare clearly distinguished from a VLESS outbound'suuid/flow. - Routing rules use a uniform predicate language;
rule_setreferences can be inline or downloaded. selectorandurltestoutbounds are first-class, with a control API (Clash-API-compatible) for runtime switching.- The DNS subsystem has more explicit detour-aware rules (each DNS server can specify which outbound to reach it through, preventing accidental loops).
- Built-in protocol modules cover all the major recent protocols (Hysteria2, TUIC, WireGuard, naive, plus all the older ones).
A simplified sing-box config skeleton:
{
"log": { "level": "info" },
"dns": {
"servers": [
{ "tag": "remote", "address": "https://1.1.1.1/dns-query", "detour": "proxy" },
{ "tag": "local", "address": "223.5.5.5" }
],
"rules": [{ "rule_set": "geosite-cn", "server": "local" }],
"final": "remote"
},
"inbounds": [
{ "type": "tun", "tag": "tun-in", "interface_name": "tun0", "auto_route": true, "stack": "system" },
{ "type": "mixed", "tag": "mixed-in", "listen": "127.0.0.1", "listen_port": 1080 }
],
"outbounds": [
{ "type": "vless", "tag": "proxy", "server": "remote.example", "server_port": 443,
"uuid": "<uuid-here>", "flow": "xtls-rprx-vision",
"tls": { "enabled": true, "server_name": "www.example.com",
"reality": { "enabled": true, "public_key": "<pubkey>", "short_id": "<shortid>" } } },
{ "type": "direct", "tag": "direct" },
{ "type": "block", "tag": "block" }
],
"route": {
"rules": [
{ "rule_set": "geoip-cn", "outbound": "direct" },
{ "rule_set": "geosite-cn", "outbound": "direct" },
{ "domain_keyword": ["google", "youtube"], "outbound": "proxy" }
],
"final": "proxy"
}
}
This is a recognizable working sing-box config skeleton. Two inbounds (system-wide TUN, plus a local SOCKS+HTTP listener on 1080). Three outbounds (the VLESS-Reality remote proxy, plus standard direct and block). Routing keeps China traffic local, sends Google/YouTube and everything-else through the proxy.
The same conceptual config in Xray would have a different JSON layout but the same dataflow. The architectural concepts translate; the specific syntax is what changes.
Operators tend to prefer sing-box for new deployments because the config is more compositional and the module set is more current; operators stay on Xray when they have legacy Xray configs that work, when they need a Reality-specific feature that hasn't yet landed in sing-box, or when they're already in the Xray ecosystem and switching cost outweighs the benefits. Both projects are actively maintained, both share a community lineage with V2Ray, and both implement essentially the same set of protocols.
Security surface and operator tax
The framework's flexibility creates failure modes that simpler tools don't have. The big classes:
Routing loops. An outbound's destination matches a rule that re-routes through that same outbound. The classic example: configure DNS to use a remote DoH server reached through the proxy outbound, but forget to except DNS queries from the routing rule that uses domain-based proxy decisions. Result: the proxy outbound wants to resolve its own server's hostname, the resolution goes through the proxy outbound, which tries to resolve its own server's hostname, infinitely. sing-box and Xray both detect simple loops, but pathological compositions can still wedge.
Split-horizon DNS errors. Resolve a domain to one IP from one DNS server, route based on that IP, then have the connection actually go through a different path that resolves the domain to a different IP. Without fake-IP, the operating system's resolver gets stale; with fake-IP misconfigured, the wrong outbound's DNS is consulted. Symptoms: works for some domains but not others, or works initially then breaks after DNS cache expiry.
Inconsistent transport identity. A common deployment composes "VLESS over TLS over Reality with SNI claiming to be Microsoft." If the Reality config and the upstream Reality-target server's TLS certificate don't actually match, the camouflage breaks — the SNI claims one identity but the certificate-handling reveals another. Censorship middleboxes designed to detect Reality look for exactly this kind of inconsistency. Setting up Reality correctly requires understanding what target site you're impersonating and ensuring the impersonation is consistent across config layers.
Policy sprawl. A deployment that starts with "5 simple routing rules" grows to "120 rules with 6 selectors and 3 URL tests" over a year of operational tweaking. The rule order determines behavior, and a new rule inserted at the wrong position can silently break existing flows. There's no built-in conflict detection; debugging is "read the logs and trace which rule matched first."
Multi-process state confusion. Some deployments run multiple sing-box or Xray instances chained together (one per node, with each node forwarding to the next). Statistics and routing decisions are per-process; debugging a problem that crosses process boundaries means correlating logs from all the processes.
Auto-update brittleness. Rule-set databases (geoip, geosite) are auto-updated. A bad update can break routing for everyone simultaneously. Production deployments either pin to specific rule-set versions or have rapid rollback procedures.
TLS fingerprint mismatches. Reality, naive, and several other transports try to match real-browser TLS fingerprints (uTLS-based). If the configured fingerprint doesn't match what the upstream Reality target expects, traffic gets rejected. Updates to either the framework's uTLS library or the upstream target's TLS configuration can break previously-working setups.
The operator tax is real. Running sing-box or Xray well requires understanding all these failure modes, monitoring for them, and reading logs when things break. For an individual operator running a personal deployment, the cost is bearable. For an organization deploying to many users with varied skill levels, the cost is substantial — which is part of why corporate deployments tend to standardize on simpler tools (WireGuard or commercial VPN gateways) and keep sing-box/Xray for the technical users who can debug them.
What these systems are unusually good at
Despite the cost, sing-box and Xray have capabilities that nothing simpler matches:
Multi-exit policy routing. Different traffic classes go through different exits, dynamically. "Banking sites direct, social media through proxy A, video streaming through proxy B with the lowest-latency route." Configurable in a way that no single-tunnel VPN approaches.
Heterogeneous transport composition. One process can handle a dozen different remote-transport protocols simultaneously. Add a new protocol, plug it in, route specific traffic through it, leave the rest of the deployment unchanged.
Censorship-evasion adaptability. When a transport gets fingerprinted and blocked (Trojan-GFW lasted years; specific Reality patterns get caught eventually), the operator switches modules without rewriting the deployment. The framework abstracts the transport choice from the routing and inbound decisions.
Rapid prototyping. Want to test "what if we routed Twitter traffic through Tokyo, but Telegram through Singapore"? Add two outbounds, add two routing rules, restart. No code, no daemons-per-protocol, no compilation.
Mesh-style multi-hop without full mesh complexity. Configure outbound chaining (outbound A's detour field points to outbound B, which forwards to outbound C). Get three-hop routing without running three separate VPN clients or building a custom mesh.
Programmable from outside the process. The Clash-compatible API in sing-box lets dashboards, mobile apps, and shell scripts query state and switch active selections. You can build a "switch nodes from this dropdown" UI without modifying the framework.
For operators whose problem is "I need flexibility to compose transports and route traffic by policy" — and that's a non-trivial fraction of advanced personal-VPN, censorship-evasion, and multi-tenant proxy deployments — these frameworks are the right tool. For operators whose problem is "I need a tunnel between two known endpoints," WireGuard is simpler and faster.
Hands-on exercise
Trace a request through a placeholder config.
Tools: text editor. Runtime: 10 minutes.
Take this skeleton config:
{
"inbounds": [
{ "type": "mixed", "tag": "mixed-in", "listen": "127.0.0.1", "listen_port": 1080 }
],
"outbounds": [
{ "type": "vless", "tag": "proxy", "server": "vps.example.com", "server_port": 443, "uuid": "...",
"tls": { "enabled": true, "server_name": "www.microsoft.com",
"reality": { "enabled": true, "public_key": "..." } } },
{ "type": "direct", "tag": "direct" }
],
"route": {
"rules": [
{ "domain_suffix": [".cn"], "outbound": "direct" }
],
"final": "proxy"
}
}
Now narrate what happens when:
- A browser configured to use SOCKS5 proxy 127.0.0.1:1080 requests
https://www.example.com.- The mixed inbound accepts the SOCKS5 connection.
- The destination
www.example.comdoesn't end in.cn, so the matching rule doesn't fire. - The
finalrule applies — outboundproxyis selected. - The VLESS outbound opens a TLS connection to
vps.example.com:443, with SNIwww.microsoft.com(Reality camouflage). - VLESS protocol bytes flow to the remote Xray/sing-box server, which decodes them and forwards to
www.example.com:443from the server side.
- A browser requests
https://baidu.com.- Same inbound, same SOCKS5 acceptance.
- Destination
baidu.comends in.cn? No, it ends in.com. So the rule doesn't fire. - Wait — this is a bug in my example. The rule checks
.cnsuffix;baidu.comis.com. So baidu.com goes through the proxy too, which is probably wrong. - The fix: add a
domain_keyword: ["baidu"]rule, or use ageosite-cnrule set that catches Chinese sites by their typical domains.
- A request for
xinhua.cn.- Inbound accepts.
.cnsuffix matches. Direct outbound. Bypasses the proxy.
Stretch: identify where a route loop could happen if you changed the proxy outbound to use a domain-based DNS strategy without excepting the proxy server's own hostname from domain-based routing.
Compare an Xray-style and a sing-box-style config skeleton side by side.
┌───────────────────────┬────────────────────────────────┬──────────────────────────────┐
│ Concept │ Xray (V2Ray-lineage) │ sing-box │
├───────────────────────┼────────────────────────────────┼──────────────────────────────┤
│ Top-level inbound │ inbounds[] │ inbounds[] │
│ Top-level outbound │ outbounds[] │ outbounds[] │
│ Routing │ routing.rules[] │ route.rules[] │
│ DNS │ dns.servers[] / dns.hosts │ dns.servers[] / dns.rules[] │
│ Tag for reference │ "tag": "..." │ "tag": "..." │
│ Protocol module │ "protocol": "vless" │ "type": "vless" │
│ Module-specific cfg │ "settings": { ... } │ <fields at module level> │
│ Transport wrap │ "streamSettings": { ... } │ "transport": { ... } /tls{} │
│ Outbound chaining │ "proxySettings.tag": "next" │ "detour": "next" │
│ Selector │ routing.balancers │ outbound type "selector" │
│ Auto-test │ via balancers + observatory │ outbound type "urltest" │
│ Geo data │ geoip.dat / geosite.dat │ geoip.db / rule-set .srs │
│ Fake-IP │ FakeDNS in dns module │ dns.fakeip │
│ External control │ gRPC API │ Clash-API HTTP │
└───────────────────────┴────────────────────────────────┴──────────────────────────────┘
The naming differs; the concepts map cleanly. Once you internalize the architecture, switching ecosystems is a matter of looking up the syntax for the same idea.
Common misconceptions and traps
"sing-box is itself a single tunnel protocol." It isn't. It's a runtime that hosts many protocol modules. The same is true for Xray. Calling either one "the protocol" is like calling a web browser "the HTTP."
"More routing rules means more power." It often means more failure modes, harder debugging, and more chances for subtle order-dependent behavior. The right number of rules is "as few as cleanly express your policy." Production deployments commonly have rule sets that are too complex for any single operator to fully reason about — which is its own kind of risk.
"If the transport is strong, DNS configuration hardly matters." DNS is what decides which transport carries the traffic. A misconfigured DNS subsystem can route everything through direct even though all the "right" outbound proxies are configured, simply because rule evaluation depended on a domain that resolved unexpectedly. DNS is often the single biggest source of "this should work but doesn't" complaints.
"A selector or URL test is just convenience UI." No — it's part of the control plane. The selector decides which outbound is currently active; if your monitoring shows traffic on outbound A but the selector is pointing at B, you have a real configuration bug.
"If one module works, copying it into every path is fine." Composition mistakes are common. If you chain three VLESS outbounds expecting "three-hop routing" but they all use the same UUID, you've created a self-loop — the second hop's outbound matches the first hop's expected client. Identity tags must be different per-hop.
"sing-box vs Xray is a real comparison question." It is — but only after you've specified workload, protocol mix, deployment platform, operator skill level, and ecosystem preferences. As a generic question with no context, the answer is "use whichever your community has more support for and whichever you're more comfortable debugging." Both work; both have rough edges; both are actively maintained.
"Reality is a magic bullet against censorship." Reality is excellent against current-generation TLS-fingerprinting censorship, but it's not magic. It depends on consistent identity (SNI matches certificate matches uTLS fingerprint matches HTTP/2 settings), on the upstream target site behaving normally, and on the censor not having developed Reality-specific detection (which is an ongoing arms race). For a deeper Reality-vs-everything-else comparison, see xray-reality-vs-wireguard.
"You can't combine these with WireGuard." You absolutely can. WireGuard is just another outbound module in both frameworks. A common deployment uses WireGuard as one of several outbounds (typically for trusted infrastructure-to-infrastructure tunnels) while using VLESS-Reality or Hysteria2 for client-facing access. Mixing protocols within a single sing-box or Xray process is the framework's whole point.
Wrapping up
sing-box and Xray are programmable transport-routing frameworks that compose inbounds, outbounds, a routing dispatcher, and a DNS subsystem into flexible policy-driven proxies. The protocol modules — VLESS, Trojan, Hysteria2, TUIC, Reality, WireGuard, Shadowsocks — are pluggable pieces inside the framework, not the framework itself. Once you see the architecture clearly, the protocol choices become tactical decisions inside a much larger strategic design.
The flexibility is unmatched in the proxy ecosystem; the operational cost is correspondingly high. For deployments where the architectural flexibility matches the actual problem (multi-tenant routing, censorship-evasion adaptability, heterogeneous transport composition), nothing simpler delivers what sing-box and Xray do. For deployments where a single tunnel between two endpoints is enough, simpler tools are better.
The next module (tailscale-and-wireguard-mesh — coming soon) goes back to a constrained-but-elegant design: how Tailscale and similar control planes turn raw WireGuard from "two-peer point-to-point tunnel" into "mesh network with NAT traversal, ACLs, and identity federation" without sacrificing the protocol's minimalism.
Further reading
- sing-box configuration documentation — the primary reference for sing-box's object model and config philosophy.
- Xray configuration documentation — canonical source for Xray's runtime modules and config model.
- Xray Working Modes — clear high-level explanation of Xray's internal request/response flow.
- Xray Transport (uTLS, REALITY) configuration — primary source for how TLS and Reality fit into the larger framework.
sing-box-config-reference— the practical config article on RouteHarden, with deployment-level concrete examples.
// related reading
OpenVPN, the friendly compromise
Why OpenVPN lasted so long: TLS in user space, TUN vs TAP, UDP vs TCP, and the flexibility costs that newer tunnels tried to remove.
WireGuard from first principles
Why WireGuard looks the way it does: Noise_IK, cryptokey routing, cookies, timers, and the design tradeoffs behind the modern minimalist VPN.
Self-hosting behind Cloudflare Tunnel without a public port
How to use Cloudflare Tunnel for published apps and private-network routes, when to use Access, and where Tunnel stops being the right tool.