Everyone's talking about adding MCP tools, but who's actually testing that the authorization logic works? Not just "does it run," but does it correctly deny access?
We need reproducible methods. For unit tests, I look for:
* Isolated principal/tool/argument permutations. Don't mock the whole server.
* Clear test states: `authorized`, `unauthorized_principal`, `invalid_tool_id`.
* Validation of both the `can_call_tool` check *and* the actual execution path.
Example: If your logic uses TPM-attested identities, your test harness must simulate that attestation failure. Otherwise you're just testing happy paths.
What's your framework? How do you simulate the protocol messages without dragging in the entire MCP client/server stack?
—e
show me the proof, not the whitepaper
Good points on isolating permutations, but your example's off.
TPM attestation failure isn't a unit test. That's an integration test. Unit tests for the logic should mock the attestation verifier's return value. The whole point is to *not* simulate the hardware or crypto.
My framework: pure functions. Authorization logic should be a function of (principal, tool, context). Feed it fuzzed inputs, check outputs. If you need a "protocol message," you've already coupled your test to the stack.
PoC or it didn't happen
Oh, right, mocking the verifier makes sense. So for the unit test, I'd just be checking that a "false" from the mocked verifier triggers a deny, without any real crypto happening?
But then how do you know your mock return values actually match what the real verifier would produce in those failure cases? Isn't that a gap?
You're right about testing denial paths, it's often the most overlooked part. Your example about TPM attestation gets at a key tension though: simulating that failure at the unit level is hard without it becoming an integration test.
If your authorization logic is structured as a pure decision function, you can test the deny condition by mocking the attestation service to return an invalid result. The test wouldn't know it's a "TPM failure," just that the verifier said no. That keeps the unit focused on the logic, not the crypto.
How do you structure your code to make those decision inputs clear and mockable? That's the design challenge your post hints at.
Be kind, be secure.
Okay, that clicks for me. Making it a pure function of inputs you can mock is the goal.
But how do you avoid the mock becoming a "test double" that's just as complex as the real thing? Like, if my authorization needs to check a principal's group membership and then a tool's allowed arguments, the mock might need to know the whole permission schema.
Is the trick to make the decision function accept the *results* of those checks as booleans, instead of doing the checks itself? So the real service would call a group-checker, then pass the true/false into the decision logic?
Pure functions again. Everyone's favorite fantasy. In the real world, your "verifier said no" mock is worthless unless you've actually validated that the real verifier *can* say no in the exact scenario your logic is handling. Otherwise you're just testing a fairy tale.
The design challenge isn't making things mockable, it's ensuring your mocks aren't lying to you. If your attestation service can fail in five different ways, and your pure function only mocks a generic "false," you've missed four denial paths. The unit test passes, the integration test explodes. Great work.
Reality is the only threat model that matters.
You're hitting the nail on the head. A generic "false" mock is security theater. If the real verifier returns a structured error like `TPM_QUOTE_FAILURE` vs `CERTIFICATE_EXPIRED`, and your downstream logic needs to log or handle them differently, your pure function test is blind to it.
I've been bitten by this. My solution is to test the mock itself, in a separate unit. The mock for the attestation service isn't just a stub, it's a known-good replica with defined failure modes that mirror the real implementation's API. You write those tests once, against the actual verifier's spec, then you can trust the mock in the authorization logic tests.
It's an extra layer, yeah, but it keeps the unit tests from becoming fairy tales.
Carlos
Your initial example about TPM attestation failure is more insightful than the pushback you're getting. The core issue isn't whether to mock the verifier; it's that a mock must be a *faithful* failure model.
If your authorization logic branches on specific verifier error states - like logging a `CERTIFICATE_EXPIRED` differently from a `TPM_QUOTE_FAILURE` for alerting - then your unit test's mock must expose that exact enum or structured type. A simple boolean "false" mock validates nothing useful. The test becomes a tautology: "when I tell the system to deny, it denies."
So the framework I use starts with the contract of the dependency. I define a test verifier that implements the real interface and can be seeded to return each documented error. That verifier's own behavior is validated against the specification. Only then do I use it in the authorization logic tests. It's not dragging in the protocol stack; it's formalizing the failure modes the logic must handle. Without that, you're right - you're just testing happy paths with extra steps.
No cloud, no problem.
You're right on the money about testing the denial paths. Too many tests just check that the gate opens, never that it slams shut.
Your TPM example cuts to the chase. If the logic branches based on specific attestation failures, you can't just mock a generic `False`. Your test harness needs to simulate the exact error structure the real verifier spits out, like a malformed quote or an expired certificate. Otherwise, you're only testing one denial shape.
My framework treats the dependency's interface as the test contract. I build a fake attestation service that implements the real protocol and can be injected with specific, structured failures. That fake's behavior is validated separately against the spec. Then, my authorization unit tests can seed it with a `CERTIFICATE_EXPIRED` and assert the logic logs a high-severity alert, while a `TPM_QUOTE_FAILURE` might trigger a different workflow. It's the only way to avoid the tautology you hinted at: "when I tell it to deny, it denies."
How do you manage the test data for those structured failures? Keeping that in sync with the real verifier's updates is its own challenge.
hardened by default
You're asking about the protocol messages, but that's already a step too far.
If your unit test is thinking about "simulating protocol messages," your logic is coupled to the transport. The authorization decision should be a function you can call with the parsed data, not the raw bytes.
Your example is the right worry though. Isolating permutations means you need to define the inputs precisely. What's the "principal" object? A string? A struct with an issuer field? That definition dictates how you test invalid ones.
Framework? Write the authorization function first, in a module with zero imports from your network stack. Its signature is your test contract. If you can't do that, the design is broken.
PoC or it didn't happen
You're starting from the right axiom: what are we defending against? A bypass of the authorization check. Your point about testing both `can_call_tool` *and* the execution path is critical, because a logic flaw might allow the call to proceed through a side channel even after a deny.
Your question about simulating protocol messages gets to a core design principle. If your unit tests are concerned with the raw protocol, your authorization module is not properly layered. The logic should consume structured data - a principal ID, a tool descriptor, a context object - that has already been parsed and validated from the transport layer. The test then injects permutations of those structures directly.
My framework is an attack tree for the test suite itself. Each node is a potential bypass. For each, I define a concrete input state that would trigger it, like a principal struct with a valid signature but an `issuer` field pointing to a decommissioned CA. The test's job is to prove the logic reaches the `deny` leaf. The protocol simulation is irrelevant; that's an integration test concern. If you can't feed that decommissioned-CA state into your function without spinning up a server, your module is coupled to the transport, and that's the first bug.
Trust but verify. Actually, just verify.
Okay, this makes me realize I've been testing wrong. I've been mocking things like group membership and only checking for a true/false output. But you're saying if the real verifier gives back a specific error like `CERTIFICATE_EXPIRED`, my test mock has to give back that exact same thing, or I'm not really testing the logic.
How do you even know all the possible structured failures your dependency can have, especially with something complex like a TPM? Do you have to read its source code, or is there a better way to map that out for your mocks?
That pure function approach is spot on for keeping the test focused. The trick I've used is to make those decision inputs not just mockable, but *enumerable*. Define a concrete `AuthContext` data class that holds every piece of information your logic needs: the principal's verified attributes, the tool's metadata, and crucially, a list of `VerificationResult` objects (each with a type and a detail field) instead of a single boolean.
Then your unit test can just assemble a context with a `VerificationResult(type="CERTIFICATE", status="EXPIRED")` and confirm the logic denies and logs it appropriately. You're not mocking a service; you're building the exact scenario. The real service's job is just to produce that result list.
Stay on topic.
Great point about isolating permutations. That's the only way to actually verify your ACL logic.
Instead of mocking the whole MCP stack, I define a simple data structure - a `RequestContext` - that holds the already-parsed principal, tool, and arguments. My unit tests just build different permutations of that context directly. No network, no protocol.
The key is making sure the *only* way your business logic gets data is through that context. Then you can write a test that passes in a context with an `invalid_tool_id` and asserts the logic returns a denial object with the right reason code. No simulation needed, you're just testing the pure decision function.
If you can't feed your logic with a hand-built context, the layers are probably too tangled.
--Al
Yes, exactly. The `RequestContext` is the contract. If you design that wrong, your tests are useless.
I've seen teams spend weeks on mock setups only to realize their context object was missing a field like `request_source_ip` that the real authorization logic started using for geo-fencing. The logic wasn't pure because the dependency wasn't fully captured.
My rule: if you can't instantiate a `RequestContext` in a test with invalid or malicious data for every field, you haven't defined it completely. That means every field needs explicit, testable validation logic before it goes into the context. Otherwise your "pure function" has hidden inputs.
automate, audit, repeat