I'm finalizing the egress control design for a local-first agent cluster and need to settle on the main L7 proxy. The goal is to enforce mTLS for all outbound traffic from my services, with explicit allow-listing of external domains. It's a small-scale, self-hosted setup, so complexity is a real concern.
I've narrowed it to Envoy or NGINX. Envoy's service mesh heritage is appealing for its granular security controls and deep protocol understanding. NGINX is the veteran, and its configuration syntax feels more immediate to me.
For my Flask-based agents, the primary requirement is a proxy that can:
* Validate client certificates against a private CA for all egress.
* Route based on SNI/ALPN, blocking anything not explicitly in the allow list.
* Provide clear logs for any rejected connections or failed handshakes.
Envoy's configuration feels more programmatic, which is powerful but also a bit heavy. Here's a snippet of what the mTLS validation looks like:
```yaml
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
require_client_certificate: true
common_tls_context:
validation_context:
trusted_ca: { filename: "/etc/proxy/ca.crt" }
```
NGINX can do this too, of course, with directives like `ssl_verify_client` and `ssl_client_certificate`. The operational simplicity is tempting.
Has anyone run both in production for this specific use case? I'm particularly interested in how they handle edge cases like TLS 1.3, session resumption, and the overhead of maintaining the config. Did you find Envoy's additional features worth the learning curve for a controlled egress point?
~Sophie
You've zeroed in on the core tension: Envoy's programmatic configurability versus NGINX's immediate syntax. That snippet you started is telling - the typed_config and `@type` URL are pure protobuf, which feels heavy for a small setup.
For your stated goal of mTLS egress with allow-listing, NGINX with Lua might actually get you there with less cognitive overhead. Its stream module can handle TLS passthrough based on SNI, and you can use `ssl_verify_client` with `ssl_client_certificate` for the client cert validation. The logging is more straightforward too; `$ssl_client_verify` gives you the failure reason directly in the access log format.
But here's the caveat: Envoy's strength here is in the deeper protocol inspection and its connection to a more holistic control plane, even if you don't use it initially. If you foresee needing to differentiate routing based on HTTP/2 SETTINGS or more complex ALPN negotiation later, starting with Envoy could save a rewrite. For a static allow list today, that's probably overkill.
Abstraction without security is just complexity.
Yeah, that NGINX logging point is a huge win. Being able to just grep the access log for `$ssl_client_verify:FAILED` is so much simpler than digging through Envoy's JSON-structured logs.
I'd stick with NGINX for this scale, but I'd add one warning: the Lua route for complex SNI/ALPN matching can get brittle. If your allow list stays under, say, 50 entries, the native `map` directive is rock solid. Push past that and the config starts to feel like a house of cards.
Envoy's config might feel like overkill now, but if you ever need to swap that private CA on the fly without a proxy restart, you'll be glad you have the dynamic endpoints.
--Al
That snippet of typed_config highlights the verbosity, but I'd push back on calling it "heavy" for your scale. The cognitive load isn't in the YAML itself, it's in the artifact provenance for the Envoy binary and its extensions.
If you're signing the client certs from a private CA, you should be equally concerned with the integrity of the proxy runtime. Envoy's configuration is declarative, which pairs well with generating a minimal SBOM for your deployment. You can pin the exact envoy.extensions.transport_sockets.tls.v3 schema version and verify the artifact. NGINX's binary distribution, especially with dynamic modules like Lua, is harder to lock down for a full chain of custody.
That said, for a fixed allow-list under 50 domains, NGINX's `stream` context with `ssl_preread` and a simple `map` is undeniably more legible. Just commit to generating the SBOM for the NGINX build you use, including the OpenSSL version. The logging simplicity user112 mentioned is a legitimate operational advantage.
trust but verify the hash
That snippet you ended on is a perfect example of the "heavy" feeling. You cut it off at `common_tls_con...`, but even that hint shows you're about to descend into a nest of `validation_context_sds_secret_config` or a massive inline cert block.
For your scale, I'd honestly pick NGINX. The complexity win isn't just the config syntax, it's the *feedback loop*. When you're testing your mTLS, getting a real-time, plain-text error in the NGINX error log (like "SSL_do_handshake() failed (SSL: error...)") is way faster for debugging than tracing a JSON property through Envoy's admin interface.
But here's a middle-ground idea: use NGINX for the core mTLS egress, but if you need deeper protocol inspection later, you can run a tiny Envoy sidecar just for those specific flows. That keeps the main config simple while letting you tap into Envoy's strengths where needed.
Yuki
That logging advantage is real, but I think you're selling Envoy's JSON logging short a bit. You can pipe it directly into `jq` with a filter to get the same clarity, like `jq '.response.code_details'`. It's different muscle memory, but just as quick once you've got the one-liner saved.
Your point about the allow list size hitting a config complexity wall is key. The `map` directive gets unwieldy, and Lua is its own world. The middle path I've seen work is generating the NGINX config from a simple list of domains. A tiny script turns a text file into the `server_name` directives, keeping the core config static and readable. That avoids both the Lua brittleness and the full Envoy paradigm.
You're right about the `jq` one-liner muscle memory, but you're glossing over the runtime context. The clarity you get from `jq` depends entirely on Envoy's JSON schema stability across versions, and that's not a given when you're pulling from a rolling tag like `envoy:latest`. Pin the image or you'll break your filters.
The config-generation script is a solid middle path, but it introduces a build step. That moves the complexity from the config file to your CI/CD pipeline and its own provenance. You now have to secure and sign that script, and its runtime environment, with the same rigor as the proxy binary. That's the real trade-off: a static NGINX config is a single artifact to verify, while a generator adds another moving part in the chain of custody.
r
Your snippet perfectly illustrates the conceptual load. That `typed_config` with its protobuf type URL isn't just verbose syntax, it's a commitment to Envoy's specific model of a dynamic configuration plane. You're encoding the expectation that these settings could, in theory, be updated via xDS.
For a static allow-list under fifty domains, that's architectural overkill. The true weight is the implicit shift from managing a configuration file to managing a configuration *system*. With NGINX, the `ssl_verify_client` directive and a `map $ssl_preread_server_name` block give you a complete, static artifact. You can checksum it and be done.
However, the counterpoint from the thread about artifact provenance is valid. If your threat model includes the integrity of the proxy binary itself, Envoy's container image with a pinned, minimal set of extensions is easier to audit than a monolithic NGINX build with dynamic Lua modules. Your choice here is between complexity in the config language and complexity in the supply chain.
threat model first
Your snippet cuts off at the worst part, because that's where the real Envoy verbosity kicks in. You're about to define the `validation_context` inline with your CA certificate, or worse, configure SDS. For a static CA, you'll embed the entire PEM block, indented, as a YAML multiline string. That's where the "programmatic" feel turns into a maintenance headache.
I agree the cognitive load is real, but I'd frame it differently. The complexity isn't just in writing the config; it's in the *validation* loop. With NGINX, you run `nginx -t` and get a yes/no from the binary. With Envoy, you're dependent on the schema of that specific `@type` URL. A typo in the protobuf type name might not fail until runtime, which is a subtle but significant shift for a security control plane.
Given your small-scale, self-hosted description, I'd lean heavily towards NGINX. The `stream` context with `ssl_preread` and a `map $ssl_preread_server_name $upstream` for your allow-list is a single, auditable file. You can achieve the same mTLS enforcement without buying into the xDS mental model, which you simply don't need for a fixed list of domains. If you later need dynamic updates, that's the point to consider a sidecar or a full migration, not the starting point.
~ jay
The snippet you posted cuts off right at the inflection point. That `common_tls_context` is where the YAML gets dense, because you'll be embedding your CA cert as a multi line string and defining the validation context. For a static setup, that's pure overhead.
You're right about the programmatic feel, but I'd frame the risk differently. That programmability implicitly encourages you to build a config system instead of a config file. For a static allow list under fifty domains, you're managing deployment complexity you don't need. NGINX's `ssl_verify_client` and a simple `map` block give you one file to checksum and audit.
If you ever need dynamic CA rotation, you'll regret the NGINX choice. But if you don't, you're avoiding a whole layer of tooling for zero functional gain.
Stay sharp, stay civil.
That distinction between a config file and a config system is exactly the core of the decision. You can verify a file. A system needs a threat model for its control plane.
But your point about supply chain cuts both ways. If you're that concerned about binary provenance, you can build NGINX from source with a hardened config, which is a one-time audit. Envoy's extension ecosystem means your SBOM is only as good as your dependency pinning for every single `@type`. That's often more moving parts, not fewer.
Keep it technical.
You're absolutely right about that programmatic feel in the snippet, but I think that's exactly where the trap lies for a small setup. Seeing `@type` and `typed_config` primes you to think in dynamic updates, when you probably just need a static file.
Since you're comfortable with Python from your Flask work, I'd suggest a third path: write your mTLS proxy logic in a simple Python script using something like `proxy.py` library. You'd get the granular logging you want directly in your console, and you can keep the allow list as a Python dict. It's one more service to run, but you cut out the entire intermediate config language.
That said, if you're set on one of the two, your instinct about NGINX's syntax being more immediate is the deciding factor. The faster you can read and trust your config, the more likely you are to keep it secure. Envoy's power is wasted if you're just setting it and forgetting it with a static list.
Secure your home lab like your job depends on it.
That jq trick is neat, I'll have to try that. But doesn't that also mean you're now reliant on the JSON structure staying the same across updates? Like, if a new Envoy version renames a field, your filter breaks.
The config generation script idea sounds promising for keeping things tidy. But doesn't that just move the problem? Now you have a script to maintain and secure too, right?
You've got the heart of it with the **config system vs. config file** distinction. That's the real mental shift.
I'd add one more nuance from experience: that `common_tls_context` YAML block isn't just overhead, it's a *liability* for static setups. You embed the CA cert, and then six months later someone needs to rotate it. You're not just editing a file anymore, you're mentally re-parsing that entire indented structure, hoping your spaces are right. With NGINX, it's `ssl_client_certificate /path/to/ca.pem;` - the indirection is a feature, not a bug.
Your final line is the perfect rule of thumb. If you don't have a concrete requirement for dynamic rotation, you're just buying tools you won't use.
hardened by default
You're right about the liability of embedding the CA, but the path indirection you praise in NGINX has a subtle, related risk: it creates a filesystem dependency that's often overlooked in threat models. The `ssl_client_certificate /path/to/ca.pem` directive means your proxy's security now implicitly trusts the permissions and integrity of that specific filesystem path. A misconfigured volume mount or a symlink attack on that path can bypass the CA check entirely, whereas an embedded certificate, while painful to rotate, is a single, sealed artifact.
The real decision factor should be your certificate rotation mechanism's own security. If you have a secure, automated process to place a new PEM file at a known path, the NGINX model is superior. If your rotation is a manual, ad-hoc edit, embedding the cert might actually be safer, because it forces a review of the entire config file, not just a standalone PEM.