The 4 Principles That Make MCP Servers Actually Work

Table of Contents

Some MCP Servers feel easy to use. Others feel unpredictable. Same model. Same prompts.
Completely different behavior.
The difference isn’t the model. It’s the design.


Why this matters

After looking at enough MCP Servers, a pattern emerges. The ones that work consistently follow a small set of principles. The ones that don’t violate them — often in subtle ways.

This isn’t theoretical. Anthropic’s internal testing found that Claude Opus 4, when given a large toolset loaded upfront, achieved only ~49% tool selection accuracy — essentially a coin flip. The issue wasn’t model capability. It was ambiguity. Tools that overlapped in purpose, behaved differently in subtle ways, or didn’t clearly signal what they did made it impossible for the model to consistently choose correctly.

Independent research from OpaqueToolsBench (Hallinan et al., arXiv 2602.15197, 2026) reached the same conclusion: LLMs struggle most not with complex tools, but with opaque ones — tools with implicit constraints, undocumented behavior, and unclear failure modes.

The problem isn’t complexity. It’s unpredictability.


The four principles

  • Clarity over completeness
  • Cohesion over convenience
  • Explicitness over inference
  • Determinism over cleverness

These aren’t preferences. They’re what make reliable LLM behavior possible.


Principle 1: Clarity over completeness

Most teams try to cover every possible scenario. They add tools “just in case.” They create slight variations to handle edge cases. It feels safer. It makes things worse.

When multiple tools look valid, the LLM has to choose between similar options. And when it can’t clearly distinguish between them, selection becomes probabilistic. Small phrasing changes produce different choices. Behavior becomes inconsistent. More tools don’t add capability — they add confusion.

Bad: search_records, find_records, query_records — all do roughly the same thing. The model guesses.

Good: search_records (by criteria), get_record (by ID) — clear distinction, predictable selection.

If multiple tools look valid, the model will guess.


Principle 2: Cohesion over convenience

It’s tempting to group everything into one server. After all, it’s all “customer data,” right?

When you mix sales workflows, support workflows, and admin operations in one place, the same request can map to multiple interpretations. “Why was this customer charged?” could mean investigate a billing issue, pull a financial report, or check a transaction. The LLM has no clear context — so it guesses.

Bad: One server with search_opportunities, create_support_ticket, manage_users.

Good: Separate servers — Sales, Support, Billing — each with a clear purpose.

If a request can mean multiple things, your scope is too broad.


Principle 3: Explicitness over inference

What’s obvious to you is invisible to the model.

Designers often rely on what feels obvious — but the LLM builds its understanding entirely from what’s described. Hidden side effects, undocumented limits, missing prerequisites — if they’re not stated, the model has to guess. And when it guesses incorrectly, it takes the wrong action or misinterprets results.

Bad: update_order(order_id, status) — also sends an email, updates inventory, and triggers payment. None of this is documented.

Good: update_order_status, send_order_notification, process_payment — each action explicit and controlled.

What feels obvious to you is invisible to the model.


Principle 4: Determinism over cleverness

“Smart” behavior feels powerful. But it introduces hidden variability.

LLMs rely on patterns. If the same inputs produce different outputs — fields appear or disappear, behavior varies silently based on context — the model can’t form stable expectations. It can’t reliably interpret results. Multi-step reasoning breaks.

Determinism means: given the same inputs, behavior should be predictable. When tool behavior can’t be predicted from its inputs, model accuracy drops sharply. Anthropic’s own data puts that drop as low as ~49%.

The real issue isn’t that behavior varies. It’s that variation is hidden. If behavior must vary, it should vary based on explicit inputs — not hidden context. This lets the model choose behavior intentionally, predict the outcome, and reason consistently.

Bad: get_customer_details(customer_id) — returns different fields depending on role, account state, or internal conditions.

Good: get_customer_details(customer_id, detail_level) where detail_level = summary | standard | full. Same input, same parameters, same outcome.

Determinism isn’t about removing variation. It’s about making variation predictable.


Bringing it together

All four principles do the same thing: they reduce ambiguity. And when ambiguity goes down, reliability goes up.

These principles define what good MCP design looks like — but they show up most clearly in one place: how you design individual tools.


The bottom line

If your MCP Server feels inconsistent, you don’t need a better model. You need less ambiguity.

And these four principles are how you get there.

Was this post useful?

Get the best of Workato straight to your inbox.

Table of Contents