insights

Solving MCP Tool Overload: How Apigene's Dynamic Tool Loading Reduces Token Costs by 98%

Apigene Team
12 min read
Solving MCP Tool Overload: How Apigene's Dynamic Tool Loading Reduces Token Costs by 98%

As AI agents scale to connect with hundreds or thousands of tools across dozens of MCP servers, a critical problem emerges: tool definitions are consuming massive amounts of context window space, dramatically increasing costs and latency. In a recent blog post, Anthropic highlighted this exact challenge, noting that agents connected to thousands of tools can require processing hundreds of thousands of tokens before even reading a user request.

At Apigene, we've solved this problem with dynamic tool loading—a progressive disclosure approach that loads tools on-demand rather than all at once. This article explores the problem, our solution, and how it reduces token consumption by up to 98% while maintaining full MCP compatibility.

The Problem: Tool Definition Overload

Context Window Bloat

Traditional MCP implementations load all tool definitions upfront into the model's context window. Each tool definition includes:

  • Tool name and description
  • Complete parameter schemas
  • Return type definitions
  • Example usage patterns

For an agent connected to 1,000 tools across 20 MCP servers, this can easily consume 150,000+ tokens just for tool definitions—before processing any user request or executing any action.

The Cost Impact

Consider an agent that needs to send a simple Slack message:

Traditional Approach:

  1. Load all 1,000 tool definitions: 150,000 tokens
  2. Process user request: 50 tokens
  3. Execute Slack action: 200 tokens
  4. Total: 150,250 tokens

With Dynamic Loading:

  1. Search for "send message" actions: 500 tokens (summary only)
  2. Get Slack action details: 2,000 tokens (full schema for one action)
  3. Process user request: 50 tokens
  4. Execute Slack action: 200 tokens
  5. Total: 2,750 tokens

Savings: 98.2% reduction in token usage.

Apigene's Solution: Dynamic Tool Loading with Progressive Disclosure

Apigene's MCP Gateway implements a progressive disclosure pattern that allows agents to discover and load tools on-demand, dramatically reducing context window consumption.

How It Works

Instead of loading all tool definitions upfront, Apigene provides discovery tools that agents can use to find and load only the tools they need:

  1. Discovery Phase: Search or list actions with lightweight summaries
  2. Detail Phase: Load full schemas only for selected actions
  3. Execution Phase: Execute actions with complete parameter information

Key MCP Gateway Tools

Apigene's MCP Gateway exposes several tools designed for efficient tool discovery:

1. list_actions - On-Demand Tool Discovery

The list_actions tool supports two modes:

Discovery Mode (Summary):

{
  "tool": "list_actions",
  "arguments": {
    "requests": [
      {
        "app_name": "Slack",
        "detail_level": "summary"
      }
    ]
  }
}

Response (Lightweight):

[
  {
    "operationId": "chat.postMessage",
    "description": "Sends a message to a channel"
  },
  {
    "operationId": "conversations.list",
    "description": "Lists all channels in a Slack team"
  }
]

This summary mode uses 70-90% fewer tokens than loading full schemas.

Details Mode (Full Schema):

{
  "tool": "list_actions",
  "arguments": {
    "requests": [
      {
        "app_name": "Slack",
        "operationIds": ["chat.postMessage"]
      }
    ]
  }
}

Only after identifying the needed action does the agent load the complete parameter schema.

2. search_actions - Semantic Tool Discovery

Agents can search across all available actions using natural language:

{
  "tool": "search_actions",
  "arguments": {
    "query": "send message",
    "detail_level": "summary",
    "max_results": 10
  }
}

This returns matching actions from multiple apps, allowing the agent to discover tools across its entire ecosystem without loading everything.

3. list_available_apps - Application Discovery

In Agent mode, agents can discover available applications:

{
  "tool": "list_available_apps",
  "arguments": {
    "include_action_summaries": true,
    "max_action_summary": 10
  }
}

This provides a high-level view of available applications and their capabilities without loading full tool definitions.

Real-World Example: From 150K to 2K Tokens

Let's walk through a practical example: an agent needs to send a Slack message and create a Jira issue.

Traditional Approach (All Tools Loaded)

Initial Context:
- 1,000 tool definitions loaded: 150,000 tokens
- User request: "Send a message to #engineering and create a Jira issue"
- Processing: 200 tokens
- Tool execution: 500 tokens
Total: 150,700 tokens

Apigene Dynamic Loading Approach

Step 1: Search for relevant actions

{
  "tool": "search_actions",
  "arguments": {
    "query": "send message slack",
    "detail_level": "summary"
  }
}

Tokens: 300 (summary results)

Step 2: Search for Jira actions

{
  "tool": "search_actions",
  "arguments": {
    "query": "create issue jira",
    "detail_level": "summary"
  }
}

Tokens: 300 (summary results)

Step 3: Get full schemas for selected actions

{
  "tool": "list_actions",
  "arguments": {
    "requests": [
      {
        "app_name": "Slack",
        "operationIds": ["chat.postMessage"]
      },
      {
        "app_name": "Jira",
        "operationIds": ["createIssue"]
      }
    ]
  }
}

Tokens: 1,500 (full schemas for 2 actions)

Step 4: Execute actions

{
  "tool": "run_multi_actions",
  "arguments": {
    "actions": [
      {
        "app_name": "Slack",
        "user_input": "Send message to engineering channel",
        "context": {
          "operationId": "chat.postMessage",
          "channel": "#engineering",
          "text": "New issue created"
        }
      },
      {
        "app_name": "Jira",
        "user_input": "Create a new issue",
        "context": {
          "operationId": "createIssue",
          "title": "Bug: Login issue",
          "description": "Users cannot log in"
        }
      }
    ]
  }
}

Tokens: 600 (execution)

Total: 2,700 tokens (98.2% reduction)

Benefits Beyond Token Savings

Dynamic tool loading provides additional benefits:

1. Faster Response Times

By reducing initial context size, agents can start processing requests faster. Instead of processing 150,000 tokens of tool definitions, agents begin with a lightweight discovery phase.

2. Better Tool Discovery

With search_actions, agents can semantically discover tools across all connected applications, even if they don't know the exact tool name or which application provides it.

3. Scalability

As you add more MCP servers and tools, the initial context size remains constant. Only the discovery phase scales, not the upfront cost.

4. Context Efficiency

Agents can filter and transform data before returning results using response_projection, further reducing token consumption:

{
  "tool": "run_action",
  "arguments": {
    "app_name": "Salesforce",
    "user_input": "Get high-value opportunities",
    "context": {
      "operationId": "listOpportunities"
    },
    "response_projection": "opportunities[?amount > 10000].{name: name, amount: amount}"
  }
}

This returns only filtered, relevant data instead of processing thousands of records.

MCP Gateway Actions Reference

Apigene's MCP Gateway provides comprehensive tools for dynamic tool management:

Discovery Tools

  • list_actions: Retrieve actions from one or multiple applications with summary or full detail levels
  • search_actions: Search across all actions using natural language queries
  • list_available_apps: Discover available applications (Agent mode)

Execution Tools

  • run_action: Execute a single action
  • run_action_batch: Execute the same action multiple times in parallel
  • run_multi_actions: Execute different actions simultaneously across multiple apps

Agent Mode Tools

  • get_instructions: Access agent instructions and capabilities
  • list_contexts: List available contexts
  • search_contexts: Search for specific contexts
  • get_context: Get detailed context information
  • add_context: Create new contexts

Best Practices for Dynamic Tool Loading

1. Always Start with Summary Mode

When discovering tools, use detail_level: "summary" to minimize token usage:

{
  "tool": "list_actions",
  "arguments": {
    "requests": [
      {
        "app_name": "Slack",
        "detail_level": "summary"
      }
    ]
  }
}

2. Load Full Schemas Only When Needed

After identifying relevant actions, load full parameter schemas:

{
  "tool": "list_actions",
  "arguments": {
    "requests": [
      {
        "app_name": "Slack",
        "operationIds": ["chat.postMessage"]
      }
    ]
  }
}

3. Use Search for Cross-App Discovery

When you're not sure which application provides a capability:

{
  "tool": "search_actions",
  "arguments": {
    "query": "send notification",
    "detail_level": "summary"
  }
}

4. Leverage Response Projection

For large datasets, use response_projection to reduce response size:

{
  "tool": "run_action",
  "arguments": {
    "app_name": "Jira",
    "user_input": "List open issues",
    "context": {
      "operationId": "listIssues"
    },
    "response_projection": "issues[*].{id: id, key: key, summary: fields.summary}"
  }
}

Comparison: Traditional vs. Dynamic Loading

MetricTraditional LoadingApigene Dynamic LoadingImprovement
Initial Context Size150,000 tokens0 tokens100% reduction
Discovery PhaseN/A300-500 tokens-
Detail LoadingAll tools upfrontSelected tools only98% reduction
Total for Simple Task150,700 tokens2,700 tokens98.2% reduction
ScalabilityLinear with tool countConstantInfinite

Getting Started with Apigene's Dynamic Tool Loading

1. Connect to MCP Gateway

Use either MCP Mode or Agent Mode:

  • MCP Mode: https://app.apigene.ai/mcp/{genai_app}/mcp
  • Agent Mode: https://app.apigene.ai/agent/{genai_app}/mcp

2. Start with Discovery

Begin by discovering available applications and actions:

{
  "tool": "list_available_apps",
  "arguments": {
    "include_action_summaries": true,
    "max_action_summary": 10
  }
}

3. Search for Relevant Tools

Use semantic search to find tools:

{
  "tool": "search_actions",
  "arguments": {
    "query": "your use case",
    "detail_level": "summary"
  }
}

4. Load and Execute

Get full details and execute:

{
  "tool": "list_actions",
  "arguments": {
    "requests": [
      {
        "app_name": "YourApp",
        "operationIds": ["specificAction"]
      }
    ]
  }
}

Then execute with run_action, run_action_batch, or run_multi_actions.

Conclusion

As AI agents scale to connect with hundreds or thousands of tools, traditional approaches that load all tool definitions upfront become prohibitively expensive. Apigene's dynamic tool loading solves this challenge through progressive disclosure, allowing agents to discover and load tools on-demand.

By reducing token consumption by up to 98%, dynamic tool loading makes it economically feasible to build agents with extensive tool ecosystems while maintaining fast response times and full MCP compatibility.

The key insight is simple: don't load what you don't need. Start with lightweight discovery, load details only for selected tools, and execute efficiently. This approach scales infinitely—whether you have 10 tools or 10,000, the initial cost remains constant.

Ready to optimize your MCP agent's token usage? Get started with Apigene's MCP Gateway and experience the power of dynamic tool loading.


Learn More:

#mcp#model-context-protocol#dynamic-tool-loading#token-optimization#ai-agents#mcp-gateway#context-window#llm-cost-reduction#progressive-disclosure#tool-discovery