Skip to content

Forum

AI Assistant
Notifications
Clear all

Switched from output classifiers to input classifiers. My throughput halved. Worth it?

1 Posts
1 Users
0 Reactions
3 Views
(@newbie_with_questions)
Eminent Member
Joined: 1 week ago
Posts: 19
Topic starter
Translate
English
Spanish
French
German
Italian
Portuguese
Russian
Chinese
Japanese
Korean
Arabic
Hindi
Dutch
Polish
Turkish
Vietnamese
Thai
Swedish
Danish
Finnish
Norwegian
Czech
Hungarian
Romanian
Greek
Hebrew
Indonesian
Malay
Ukrainian
Bulgarian
Croatian
Slovak
Slovenian
Serbian
Lithuanian
Latvian
Estonian
  [#1116]

Hi everyone. Long-time lurker, first-time poster here. I’ve learned so much from this subforum over the last few months, so first of all, thank you for all the shared knowledge. 😊

I’ve been running a small internal tool for my team that uses an LLM to help summarize support ticket escalations. For the first few months, I followed the common pattern of using an **output classifier** to check the LLM's final response for signs of prompt injection or data exfiltration attempts. It was simple, ran after the generation, and seemed fine.

Recently, after reading some threads here, I decided to be more proactive and switched to an **input classifier** model. The idea was to vet the user's initial prompt *before* it ever reaches the LLM, rejecting anything suspicious upfront. I implemented a distilled model that runs in my FastAPI middleware, checking each request.

However, I’ve run into a pretty significant operational issue: my overall request **throughput has dropped by roughly half**. The latency for each request has increased because now I’m:
* Serializing the prompt for the classifier
* Running the (admittedly smaller) model inference
* Waiting for its verdict before the main LLM call can even begin

It feels like I’ve moved from a "fire-and-forget-then-check" model to a "wait-at-the-door-with-a-checklist" model. My setup is a homelab-style deployment, so my resources aren't endless:
* The app runs in Docker containers on a single host.
* The main LLM and the new input classifier are separate containers (different models).
* I'm using a Python backend with `transformers` for the classifier.

My core question for the community is: **Is this trade-off inherently worth it?** I know blocking a malicious prompt *before* it consumes expensive LLM tokens and context window feels logically better. But the performance hit is so tangible. I'm wondering:

* Is a 50% throughput drop typical for this kind of shift?
* Are there patterns to mitigate this without sacrificing too much safety?
* Do you find the *cost* of the input classifier (in performance and complexity) justified by the *benefit* of pre-emptive blocking, compared to a post-hoc output check?

I’m especially curious about the false-positive angle. I’ve already had to tune the classifier threshold because it was flagging some urgent but messily written tickets. An output classifier seemed more forgiving of strange-but-benign inputs.

Any insights from your experiences would be immensely helpful. I want to do this right, but I also need the tool to remain usable for the team.

- Liam


- Liam


   
Quote