Switched from self-hosted VMs to vendor for one reason: staffing.

Self-Hosted vs. Vendor-Hosted Risk Tradeoffs

Last Post by Elle Morrison 9 hours ago

1 Posts

1 Users

0 Reactions

3 Views

RSS

Elle Morrison

(@kernel_guard_elle)

Active Member

Joined: 1 week ago

Posts: 9

Topic starter

Translate ▼

July 1, 2026 1:01 am [#1215]

Our internal security engineering team has, for the last five years, maintained a fleet of self-hosted agent runtimes on hardened VMs. The architecture was built around a custom LSM module (derived from Yama) and a significant eBPF filter suite for network and filesystem control, designed to enforce a strict deny-by-default policy for outbound agent traffic and namespace isolation. The control we had was absolute; we could audit every syscall, pin every capability, and our data never left the perimeter.

However, we are transitioning the entire workload to a vendor-hosted platform this quarter. The decision was not driven by a technical evaluation of the security models—ours was arguably more robust at the enforcement layer. It was purely a resource allocation problem. The operational burden of maintaining that level of hardening is immense:

* **LSM Policy Drift:** Every kernel update, even minor ones, required a full regression test of our custom hooks. A subtle change in the securityfs API or in the internal kernel structures could break our module's initialization, leaving agents in a permissive fallback state until detected and patched.
* **eBPF Toolchain Churn:** Keeping the eBPF verifier happy across kernel versions, while maintaining complex tail calls and map structures for our allow-lists, consumed roughly 30% of one senior engineer's time.
* **Incident Response Ownership:** When an agent exhibited anomalous behavior, *we* owned the full stack. Tracing a network call from a userland process, through the eBPF filter, to the LSM credential check, and finally to the netfilter layer, is a deep and time-consuming investigation.

The tradeoff is clear: we are exchanging granular technical control for a reduction in operational overhead and a transfer of baseline infrastructure security liability. My concern is that we are now abstracted away from the enforcement points. I can no longer directly audit the `security_bprm_check` or `file_open` hooks being applied. We must trust the vendor's implementation of their isolation, which likely uses namespaces and cgroups with a standard LSM like AppArmor, not the deeply customized regime we had.

The question for this forum is: how do we effectively map our previous, explicit security model onto a vendor's shared-responsibility framework? Specifically:

* What methodologies exist for black-box testing the effective LSM policy applied to a hosted agent runtime? Can we derive a securityfs snapshot or probe with privileged containers to infer rules?
* In a breach scenario involving a compromised agent, does the burden of proof for isolation failure now lie with us, the customer, or with the vendor? Our legal team is unclear on how to structure the SLA.
* Has anyone built a secondary containment layer (e.g., a Landlock policy or a minimal, static eBPF program) *inside* a vendor-hosted runtime to reintroduce a verified enforcement layer under your own control?

The core tension is that our staffing model could not sustain the self-hosted world, but my threat model has not changed. I am seeking strategies to regain deterministic security guarantees in a non-deterministic, managed environment.

- EM

The kernel is the root of trust.

Quote

Topic Tags

80 Forums
1,232 Topics
7,409 Posts
0 Online
508 Members

Forum Icons: Forum contains no unread posts Forum contains unread posts

Topic Icons: Not Replied Replied Active Hot Sticky Unapproved Solved Private Closed