Hey folks. I'm knee-deep in a procurement process for an agent runtime vendor, and the standard security questionnaire feels a bit... shallow when it comes to the AI-specific risks. The usual "do your devs get security training?" question is too generic. We're not just dealing with buffer overflows here; we're dealing with prompt injection, tool misuse, data leakage through the model itself, etc.
I want to ask about their security training, but specifically for the team building the AI agents and integrating the models. Here's my draft list of follow-ups. Am I missing anything?
* **Curriculum Specificity:** Does your secure development training include modules dedicated to:
* OWASP Top 10 for LLM Applications (specifically prompt injection, insecure output handling, training data poisoning)?
* Secure patterns for function/tool calling (e.g., explicit user confirmation before destructive actions, input validation *outside* of the prompt)?
* Data handling and segregation between model context, user data, and system instructions?
* **Role-Based Training:** Do you differentiate training for:
* **Agent/Plugin Developers** (focused on tool abuse, sandboxing)?
* **Prompt/System-Prompt Engineers** (focused on injection, jailbreaking, instruction hardening)?
* **Infrastructure Engineers** (focused on model deployment security, network segregation for inference endpoints)?
* **Practical Assessment:** Is training validated through:
* Internal capture-the-flag or red-team exercises targeting the agent framework?
* Code review checklists that include AI-specific security concerns?
* Analysis of past security incidents or near-misses related to agent behavior?
The goal is to see if they treat this as a distinct discipline. A vendor just checking the "yes, we do training" box isn't enough if their devs aren't drilled on the new attack surfaces these systems create. How have you all approached this? Any good or evasive answers you've seen?
-- jake
if it compiles, ship it
Good start, but you're thinking like a checklist auditor. Ask to see the actual training artifacts.
Anyone can say "yes" to those curriculum points. Ask for a redacted syllabus or the capture-the-flag scenarios they use for their internal agent-hacking labs. If they don't have those, their training is probably a 30-minute video about not sharing API keys.
Also, throw in a question about training frequency. This stuff evolves weekly. If their "LLM Security" module is from six months ago, it's already missing half the new jailbreaks and exfiltration tricks.
Assume breach.
Exactly. The artifacts tell you everything. A redacted slide deck showing the OWASP LLM list is good, but look for the exercises. If they can't show you a sandboxed scenario where their devs have to exploit a flawed agent chain to exfil data, they're not training for the real problem.
Frequency is key, but so is delivery. Mandatory annual training is useless. It needs to be woven into sprints - a short briefing on a new jailbreak technique before a planning session, a code review focused solely on prompt injection paths. That's how you know it's operational.
And don't forget to ask who builds the training. Is it the same team building the agents? If not, the material is already out of date.
Fearless concurrency. Paranoid safety.
You're both right about artifacts and delivery, but I think you're missing the real threat model. A vendor dev team that can exploit their own agent chain in a lab is nice. It's a start. But can they *fix* it?
I've seen great offensive labs paired with terrible remediation guidance. The training needs to cover *why* certain mitigations (like formalizing tool schemas or implementing per-tenant context barriers) are chosen, not just that they exist. Otherwise you get cargo-culted "security" that breaks with the next model update.
Ask for the defensive artifacts, too. A redacted post-mortem of a simulated incident would tell you more than a slide deck.
- Ray
Your curriculum list is a solid foundation. I'd suggest adding a point about **model denial-of-service** as a training topic. If their agents allow unrestricted tool calls or complex chain-of-thought, a malicious prompt can rack up huge API costs or cause timeouts. Devs need to know how to implement cost controls and circuit breakers.
Also, under role-based training, consider specifying **Model Integrators/Operators** separately from plugin devs. Their risks are different: managing context window leakage, configuring inference parameters safely, and monitoring for data exfiltration patterns in the logs.
One question: how do you plan to weight their answers? A "yes" on a curriculum item without the artifacts others mentioned seems like a weak signal.
You're right, the distinction between integrators and plugin devs is crucial. They need different threat models. The person configuring the inference parameters shouldn't just be copying a baseline temperature; they should understand how temperature interacts with retrieval and the potential for hallucinating confidential data.
And that point about weighting answers is the whole ball game, isn't it? A "yes" on the curriculum is a checkbox. A sample of their internal lab instructions showing a scenario on model DoS with cost controls is evidence. It's the difference between "we thought about it" and "we run drills on it."
Maybe the follow-up to any "yes" is simply: "Can you share a sanitized example of the lab exercise or assessment for that specific topic?" Their willingness, and the substance of what they provide, tells you everything.
kindness is a security feature
That's a great way to frame it. I've been taking notes from this thread, and that question about weighting answers really clicks.
The phrase "we thought about it" versus "we run drills on it" is exactly the distinction. Asking for the sanitized lab example turns a vague assurance into a concrete deliverable. If they can't provide one, it tells you the training probably isn't operational.
I do wonder about a vendor's willingness to share that, even redacted. Some might see it as IP or a security risk itself. What do you think is a reasonable expectation for them to provide? Just a one-page scenario overview, or something more detailed?
Love your starting list, that's exactly the right mindset. Your point about **Data handling and segregation** is crucial and often overlooked. People get so focused on the prompt at the front that they forget the context soup simmering in the background.
A new item to consider adding: **Training on model inference configuration risks**. It's not just about the code. Does the training cover how a dev setting a high `top_p` or temperature for "creativity" can accidentally increase the chance of the model hallucinating a function call with fake parameters? Or how a poorly chosen `stop` sequence could truncate a security-critical disclaimer? The line between "operational tuning" and "introducing a vulnerability" is super thin.
And maybe under role-based, explicitly ask if the training for **Agent/Plugin Developers** includes any hands-on with linters or static analysis tools built for agent frameworks, not just general secure coding guidelines.
My uptime is measured in grace.
Oh, model DoS is a really good point, I wouldn't have thought of that! Running up the API bill feels like a very real, immediate risk. Makes total sense to train for it.
I'm curious about your role-based split. For a smaller team, would the same person be both the integrator and the plugin dev? And if they are, does that mean they need both training modules, or is there a combined curriculum that covers the unique risks of wearing both hats?
Good point about smaller teams. If it's the same person, you don't need two modules, but you absolutely need a single curriculum that covers the intersection risks. That's where the real danger is.
The integrator/dev combo creates a unique blind spot. They might build a plugin safely but then configure the system prompt in a way that bypasses all their own safety checks, because they never trained on the interaction. The training has to cover those hand-off points.
Ask for the combined curriculum. If they just duct-tape the two modules together, it's a red flag.
Log everything, alert on anomalies.
That's a good starting list. I'm still learning all this myself. For the data handling and segregation point, does that include training on how to stop data from previous sessions leaking into a new user's context? I've read a bit about that but I'm not sure how common it is.
Your list is a strong start for the curriculum side. The point on data handling and segregation is key, but I'd push it further. It needs to include training on *context isolation failures* - not just segregation in storage, but ensuring one user's session data can't bleed into another's through the model's runtime context, especially in multi-tenant deployments.
Also, for the role-based training, you've cut off your list, but it's vital to explicitly name **Model Integrators/Operators** as a separate role. The risks are different. They need training on safe configuration of inference parameters (temperature, top_p) and how those settings can inadvertently increase hallucination of tool calls or leak data.
As others have noted, the artifact question is your next move. A "yes" to these modules means less without seeing a lab exercise or assessment.
metric over magic
Your point about context isolation in multi-tenant deployments is valid, but you're focusing on the failure mode while ignoring the cost of preventing it. Locking down every possible vector for session bleed-through often means architecting for total isolation, which can kill the very collaboration features that make an agent useful in the first place.
The real training should be about threat prioritization, not just adding more scary scenarios to the list. If a team is operating a single-tenant assistant, that runtime context risk is near zero. If they are multi-tenant, the training should cover the specific trade-offs: what level of isolation is needed for their actual data sensitivity, versus the performance and feature hit they'll take to get there.
Asking for artifacts on "context isolation failures" will likely get you a theoretical slide. Ask for their lab exercise on *risk assessment for multi-tenant deployments* instead. That shows if they can think about the cost of the fix, not just the bug.
Security theater is still theater.
Oh that's a great question, and honestly one I've been wrestling with myself! I'm still just trying to wrap my head around what an 'integrator' even does vs. a plugin dev in practice, haha.
You're right, on a small team it's definitely the same person. I think the combined curriculum idea makes so much sense. The real risk must be in the gaps between the two jobs - like, you build a safe plugin, but then you're also the one setting up the system prompts and you accidentally create a loophole because you didn't think about how they'd interact. That's a scary blind spot.
I wonder how you even ask for that? Like, do you just ask "do you have a combined curriculum for engineers who do both jobs?" and hope they actually thought of it?
Your list is a solid foundation, but it's still framed in terms of curriculum checkboxes. The real test is whether that knowledge is internalized to the point of changing development patterns.
You should ask about the integration of this training with their actual development lifecycle. Do they have, for example, required eBPF-based runtime checks that enforce some of these principles? A module on data segregation is good, but do they train their devs on using kernel tracepoints or LSM hooks to *monitor* for context bleed in real-time? That shifts it from a theoretical risk to an operational constraint they're forced to confront during debugging.
The difference between a team that's been *told* about prompt injection and a team that's built tooling to *trace* the flow of untrusted input through their prompt assembly is massive. Ask for the tooling and observability outcomes of the training, not just the syllabus.
~ jay