Skip to content

Forum

AI Assistant
Notifications
Clear all

Switched from self-hosted VMs to vendor for one reason: staffing.

1 Posts
1 Users
0 Reactions
3 Views
(@kernel_guard_elle)
Active Member
Joined: 1 week ago
Posts: 9
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1215]

Our internal security engineering team has, for the last five years, maintained a fleet of self-hosted agent runtimes on hardened VMs. The architecture was built around a custom LSM module (derived from Yama) and a significant eBPF filter suite for network and filesystem control, designed to enforce a strict deny-by-default policy for outbound agent traffic and namespace isolation. The control we had was absolute; we could audit every syscall, pin every capability, and our data never left the perimeter.

However, we are transitioning the entire workload to a vendor-hosted platform this quarter. The decision was not driven by a technical evaluation of the security models—ours was arguably more robust at the enforcement layer. It was purely a resource allocation problem. The operational burden of maintaining that level of hardening is immense:

* **LSM Policy Drift:** Every kernel update, even minor ones, required a full regression test of our custom hooks. A subtle change in the securityfs API or in the internal kernel structures could break our module's initialization, leaving agents in a permissive fallback state until detected and patched.
* **eBPF Toolchain Churn:** Keeping the eBPF verifier happy across kernel versions, while maintaining complex tail calls and map structures for our allow-lists, consumed roughly 30% of one senior engineer's time.
* **Incident Response Ownership:** When an agent exhibited anomalous behavior, *we* owned the full stack. Tracing a network call from a userland process, through the eBPF filter, to the LSM credential check, and finally to the netfilter layer, is a deep and time-consuming investigation.

The tradeoff is clear: we are exchanging granular technical control for a reduction in operational overhead and a transfer of baseline infrastructure security liability. My concern is that we are now abstracted away from the enforcement points. I can no longer directly audit the `security_bprm_check` or `file_open` hooks being applied. We must trust the vendor's implementation of their isolation, which likely uses namespaces and cgroups with a standard LSM like AppArmor, not the deeply customized regime we had.

The question for this forum is: how do we effectively map our previous, explicit security model onto a vendor's shared-responsibility framework? Specifically:

* What methodologies exist for black-box testing the effective LSM policy applied to a hosted agent runtime? Can we derive a securityfs snapshot or probe with privileged containers to infer rules?
* In a breach scenario involving a compromised agent, does the burden of proof for isolation failure now lie with us, the customer, or with the vendor? Our legal team is unclear on how to structure the SLA.
* Has anyone built a secondary containment layer (e.g., a Landlock policy or a minimal, static eBPF program) *inside* a vendor-hosted runtime to reintroduce a verified enforcement layer under your own control?

The core tension is that our staffing model could not sustain the self-hosted world, but my threat model has not changed. I am seeking strategies to regain deterministic security guarantees in a non-deterministic, managed environment.

- EM


The kernel is the root of trust.


   
Quote