Designing MCP Tools? Make Them Easy for LLMs to Use Correctly
You can have the right scope. You can follow the right principles. And still end up with an MCP Server that doesn’t work.
Tools are where it breaks — or holds together.
Download One Pager →
Tools are the real interface
For an LLM, tools are the system. It doesn’t see your APIs. It doesn’t see your architecture. It only sees tool names, descriptions, parameters, and outputs.
If those are unclear, the system is unclear. Tool design isn’t just about what actions are possible. It determines whether the model can reason correctly at all.
What breaks MCP tools in practice
Most failures don’t come from missing functionality. They come from tools that are technically correct — but hard to reason about.
1. When tools are hard to choose
When multiple tools look equally valid, the model guesses. And small phrasing differences lead to different choices.
search_customers and find_customers both sound right. Neither is clearly better. That’s the problem.
Bad: overlapping names and responsibilities.
Good: search_customers (by criteria), get_customer (by ID) — different intent, clear choice.
Tool names should make the right choice obvious.
2. When parameters don’t match how users think
Generic, overloaded parameters force the model to interpret before it can act.
search_orders(query, filters, options) isn’t a capability problem — it’s a mapping problem. The model now has to decide what goes in query, what belongs in filters, and what options even means.
Good: search_orders(customer_name, date_range, status) — each parameter maps directly to how users express intent.
Parameters should reflect how people ask for things.
3. When tools return too much data
This is one of the most common — and least obvious — failure modes. Tools return full objects, large lists, no limits. It feels helpful. It isn’t.
The model assumes responses are complete. Important signals get buried. Context gets wasted. In one internal setup, tool definitions alone consumed tens of thousands of tokens before a single task began — context spent describing capabilities, not doing work.
Bad: hundreds of records with no indication of truncation.
Good: bounded results, has_more: true, pagination support.
Don’t just return data — return bounded, usable data.
4. When outputs don’t explain what happened
status: "success", data: [] is technically correct. But what does it mean? No results? Wrong input? No access? The model can’t tell — and different outcomes require different decisions.
nothing_found → broaden the search or inform the user. invalid_reference → ask for correction. permission_denied → stop or escalate.
If those aren’t distinguished, the model picks the wrong path.
Outputs should communicate meaning, not just status.
5. When outputs are correct — but not usable
Even explicit outputs can quietly break multi-step workflows. Raw IDs, loosely structured data, fields with unclear meaning — the model has to reinterpret the output before continuing. That introduces errors.
Good: structured data, clearly labeled fields, values directly reusable in follow-up calls.
Tool outputs should enable the model to continue without guessing.
The checklist
Before shipping any tool, ask:
- Does the name make the right choice obvious?
- Do parameters map to how users actually express intent?
- Are results bounded with clear truncation signals?
- Do outputs distinguish between meaningfully different states?
- Can the model use the output directly in the next step?
Bringing it together
All five failures share the same root cause: the tool makes execution possible — but reasoning difficult.
Good tool design does the opposite. It makes the right choice obvious, the inputs clear, the outputs bounded and usable, the results meaningful, and the next step predictable.
The bottom line
MCP tools aren’t just functions. They are interfaces for reasoning.
If the model has to guess at any step — selection, input, output, or interpretation — reliability breaks.
That’s the series. Four parts, one through-line: design for the model doing the reasoning, not the developer building the system.