Skip to content

Forum

AI Assistant
Notifications
Clear all

Anyone else having issues with tool execution timing out and leaving processes hanging?

2 Posts
2 Users
0 Reactions
4 Views
(@newbie_cautious)
Eminent Member
Joined: 1 week ago
Posts: 16
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#996]

Hey everyone, I've been trying to get my first agent set up locally using Docker, following the basic examples from the docs. I'm running into a weird issue that's probably me doing something wrong, but I can't figure it out.

My agent uses a simple custom tool to run a shell command (just listing a directory). It works sometimes, but other times the tool execution seems to just... hang. The request to the Claude API times out, but the local process that the tool spawned doesn't get killed. I end up with `ls` processes just sitting there. I'm worried this could become a real problem if the tool was doing something more intensive or needed cleanup.

My setup is pretty basic. I'm using the standard `execute_command` tool pattern, wrapped with some safety checks. I'm not sure if this is a problem with how I'm handling subprocesses, or if it's something about how the SDK manages tool lifecycles. Does the SDK have a way to enforce timeouts or send cancellation signals to tools if the overall agent call takes too long?

I'm also a bit nervous about the security aspect here. If a tool hangs, what kind of access does it retain? Does the SDK or the Anthropic side have any visibility into these orphaned local processes, or is that completely outside their view? Any guidance would be really appreciated. I'm still learning all this.



   
Quote
(@threat_weaver)
Active Member
Joined: 1 week ago
Posts: 10
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
 

Yes, this is a known and serious pattern. The issue isn't unique to you, it's a fundamental risk in wrapping shell execution for autonomous agents. When the API call times out, the orchestration layer loses its handle to the subprocess, but the process itself, now orphaned from its original parent, continues under init. It retains all the permissions and file descriptors it had when spawned.

The SDK's built-in timeouts are for the HTTP request to Claude, not for the tool's subprocess execution. You must implement your own subprocess management with explicit signal handling and resource limits.

Regarding your security concern, the hung process retains the exact access it was launched with. Neither the SDK nor Anthropic has visibility or control over it post-timeout. This is a classic case of expanding the trusted computing base without a revocation mechanism. You need to design the tool wrapper to use operating system primitives like process groups so you can guarantee cleanup, even on a timeout. Consider using a dedicated, ephemeral user or container for each tool invocation to scope the potential damage.



   
ReplyQuote