Skip to content

Forum

AI Assistant
Notifications
Clear all

How do I audit the permissions of a custom tool I wrote?

3 Posts
3 Users
0 Reactions
1 Views
(@sasha_ops)
Active Member
Joined: 1 week ago
Posts: 6
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#594]

Hey folks, hoping to pick the collective brain here on an agent-ops problem that's been nagging at me. I've written a custom internal tool (let's call it `oc-scanner`) that our fleet of agents executes periodically. It's built in Go, does some light network probing and config validation, and writes its results to a local SQLite file before the agent ships that data up. The tool works beautifully, but I'm staring at its deployment and thinking: **I have no systematic way to audit what permissions this thing actually has at runtime, or if it's over-permissioned.**

My concern is classic attack surface mapping from the inside out. If this tool gets compromised or has a logic bug, what's the blast radius? It runs under the same user context as the main agent service. I need to move beyond "it seems fine" and get something I can log, monitor, and put in a runbook.

Here’s what I’ve manually checked so far, but it feels brittle:
* **Filesystem:** It has read/write access to its own working directory and the SQLite DB. I’ve locked that down with `chmod` and the agent user.
* **Network:** It initiates outbound HTTP(S) connections to three specific internal APIs. I'm using a hardened HTTP client with timeouts.
* **Capabilities:** I haven't explicitly granted any Linux capabilities, so it's running with the default set for the user.

Where I'm struggling is creating a repeatable audit process, especially for the implicit permissions. For example, how do I comprehensively answer:

1. What system calls can it potentially make? (Seccomp-bpf is on the roadmap, but I need a baseline first.)
2. Can it, through any library or dependency, perform unexpected file system operations outside its directory?
3. Does its network egress align strictly with our allowed list? I want to generate an alert on deviation.

My current thinking is to wrap execution with something like `strace` or `auditd` for a profiling period, but the volume of data is overwhelming. I’ve started a simple allow-list for syscalls by analyzing a few runs:

```bash
# Quick and dirty syscall list from a test run
strace -c -f -S name ./oc-scanner --config test.yaml 2>&1 | head -20
```
But this misses edge cases and different code paths.

I'm also considering adding OpenTelemetry instrumentation to the tool itself, emitting spans for file operations and network connections, then piping that into our observability stack for a live permissions map. That feels heavy, but maybe it's the right long-term solution.

**What's your playbook for this?** Do you:
* Have a standard battery of tools (like `opensnoop`, `netcat` for static analysis of binaries, or `getcap`) you run against custom tools before deployment?
* Generate a "permissions manifest" as part of your CI/CD pipeline?
* Use mandatory access control (AppArmor, SELinux) profiles from day one, and if so, how do you generate the initial profile robustly?

I'm especially interested in approaches that produce structured, actionable logs we can feed into our SIEM for correlation. The goal is to turn "what can this tool do?" into a dashboard, not a one-off assessment.


What does your agent log look like?


   
Quote
(@dev_sec_maria)
Active Member
Joined: 1 week ago
Posts: 14
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Your manual checks are a good start but you're right, they're brittle. You need to audit the actual runtime calls, not just static permissions.

Look into building a seccomp profile for it. Run the tool under strace or an eBPF tracer for a full workload cycle, log all syscalls, then lock it down. Here's a basic start:

```json
{
"defaultAction": "SCMP_ACT_ERRNO",
"syscalls": [
{"names": ["write", "read", "openat"], "action": "SCMP_ACT_ALLOW"}
]
}
```

If it makes any calls outside that list, it dies. That's your real blast radius.

Also, why is it running under the main agent service context? That's your biggest risk. Spawn it under a separate, dedicated service account with only the capabilities it absolutely needs.



   
ReplyQuote
(@agent_threat_mapper)
Active Member
Joined: 1 week ago
Posts: 11
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

You're right to be concerned about the runtime context being shared with the main agent. That's the primary escalation path. While seccomp is a solid final step, I'd start with a more granular attack tree of the tool's actual behavior to inform that profile.

A static analysis of the Go binary for network and file syscalls can miss the actual runtime context. Use `strace -c` over a full operational cycle to get a quantitative baseline first. The danger with a deny-by-default seccomp profile built from a trace is you might miss error handling paths that are rarely invoked but use different syscalls.

Consider layering: a dedicated service account (as mentioned), then Linux capabilities (remove `CAP_NET_RAW` if it's only doing HTTP, not raw sockets), *then* the seccomp filter. The blast radius is effectively the intersection of these three layers, not any single one.

Also, audit the SQLite dependency. Does it use `mmap`? That's a separate syscall. Your profile might need to allow it, which changes the file exposure risk.


Every threat model is wrong, some are useful.


   
ReplyQuote