Solving MCP Tool Overload: How Apigene's Dynamic Tool Loading Reduces Token Costs by 98%

As AI agents scale to connect with hundreds or thousands of tools across dozens of MCP servers, a critical problem emerges: tool definitions are consuming massive amounts of context window space, dramatically increasing costs and latency. In a recent blog post, Anthropic highlighted this exact challenge, noting that agents connected to thousands of tools can require processing hundreds of thousands of tokens before even reading a user request.
At Apigene, we've solved this problem with dynamic tool loading—a progressive disclosure approach that loads tools on-demand rather than all at once. This article explores the problem, our solution, and how it reduces token consumption by up to 98% while maintaining full MCP compatibility.
The Problem: Tool Definition Overload
Context Window Bloat
Traditional MCP implementations load all tool definitions upfront into the model's context window. Each tool definition includes:
- Tool name and description
- Complete parameter schemas
- Return type definitions
- Example usage patterns
For an agent connected to 1,000 tools across 20 MCP servers, this can easily consume 150,000+ tokens just for tool definitions—before processing any user request or executing any action.
The Cost Impact
Consider an agent that needs to send a simple Slack message:
Traditional Approach:
- Load all 1,000 tool definitions: 150,000 tokens
- Process user request: 50 tokens
- Execute Slack action: 200 tokens
- Total: 150,250 tokens
With Dynamic Loading:
- Search for "send message" actions: 500 tokens (summary only)
- Get Slack action details: 2,000 tokens (full schema for one action)
- Process user request: 50 tokens
- Execute Slack action: 200 tokens
- Total: 2,750 tokens
Savings: 98.2% reduction in token usage.
Apigene's Solution: Dynamic Tool Loading with Progressive Disclosure
Apigene's MCP Gateway implements a progressive disclosure pattern that allows agents to discover and load tools on-demand, dramatically reducing context window consumption.
How It Works
Instead of loading all tool definitions upfront, Apigene provides discovery tools that agents can use to find and load only the tools they need:
- Discovery Phase: Search or list actions with lightweight summaries
- Detail Phase: Load full schemas only for selected actions
- Execution Phase: Execute actions with complete parameter information
Key MCP Gateway Tools
Apigene's MCP Gateway exposes several tools designed for efficient tool discovery:
1. list_actions - On-Demand Tool Discovery
The list_actions tool supports two modes:
Discovery Mode (Summary):
{
"tool": "list_actions",
"arguments": {
"requests": [
{
"app_name": "Slack",
"detail_level": "summary"
}
]
}
}Response (Lightweight):
[
{
"operationId": "chat.postMessage",
"description": "Sends a message to a channel"
},
{
"operationId": "conversations.list",
"description": "Lists all channels in a Slack team"
}
]This summary mode uses 70-90% fewer tokens than loading full schemas.
Details Mode (Full Schema):
{
"tool": "list_actions",
"arguments": {
"requests": [
{
"app_name": "Slack",
"operationIds": ["chat.postMessage"]
}
]
}
}Only after identifying the needed action does the agent load the complete parameter schema.
2. search_actions - Semantic Tool Discovery
Agents can search across all available actions using natural language:
{
"tool": "search_actions",
"arguments": {
"query": "send message",
"detail_level": "summary",
"max_results": 10
}
}This returns matching actions from multiple apps, allowing the agent to discover tools across its entire ecosystem without loading everything.
3. list_available_apps - Application Discovery
In Agent mode, agents can discover available applications:
{
"tool": "list_available_apps",
"arguments": {
"include_action_summaries": true,
"max_action_summary": 10
}
}This provides a high-level view of available applications and their capabilities without loading full tool definitions.
Real-World Example: From 150K to 2K Tokens
Let's walk through a practical example: an agent needs to send a Slack message and create a Jira issue.
Traditional Approach (All Tools Loaded)
Initial Context:
- 1,000 tool definitions loaded: 150,000 tokens
- User request: "Send a message to #engineering and create a Jira issue"
- Processing: 200 tokens
- Tool execution: 500 tokens
Total: 150,700 tokens
Apigene Dynamic Loading Approach
Step 1: Search for relevant actions
{
"tool": "search_actions",
"arguments": {
"query": "send message slack",
"detail_level": "summary"
}
}Tokens: 300 (summary results)
Step 2: Search for Jira actions
{
"tool": "search_actions",
"arguments": {
"query": "create issue jira",
"detail_level": "summary"
}
}Tokens: 300 (summary results)
Step 3: Get full schemas for selected actions
{
"tool": "list_actions",
"arguments": {
"requests": [
{
"app_name": "Slack",
"operationIds": ["chat.postMessage"]
},
{
"app_name": "Jira",
"operationIds": ["createIssue"]
}
]
}
}Tokens: 1,500 (full schemas for 2 actions)
Step 4: Execute actions
{
"tool": "run_multi_actions",
"arguments": {
"actions": [
{
"app_name": "Slack",
"user_input": "Send message to engineering channel",
"context": {
"operationId": "chat.postMessage",
"channel": "#engineering",
"text": "New issue created"
}
},
{
"app_name": "Jira",
"user_input": "Create a new issue",
"context": {
"operationId": "createIssue",
"title": "Bug: Login issue",
"description": "Users cannot log in"
}
}
]
}
}Tokens: 600 (execution)
Total: 2,700 tokens (98.2% reduction)
Benefits Beyond Token Savings
Dynamic tool loading provides additional benefits:
1. Faster Response Times
By reducing initial context size, agents can start processing requests faster. Instead of processing 150,000 tokens of tool definitions, agents begin with a lightweight discovery phase.
2. Better Tool Discovery
With search_actions, agents can semantically discover tools across all connected applications, even if they don't know the exact tool name or which application provides it.
3. Scalability
As you add more MCP servers and tools, the initial context size remains constant. Only the discovery phase scales, not the upfront cost.
4. Context Efficiency
Agents can filter and transform data before returning results using response_projection, further reducing token consumption:
{
"tool": "run_action",
"arguments": {
"app_name": "Salesforce",
"user_input": "Get high-value opportunities",
"context": {
"operationId": "listOpportunities"
},
"response_projection": "opportunities[?amount > 10000].{name: name, amount: amount}"
}
}This returns only filtered, relevant data instead of processing thousands of records.
MCP Gateway Actions Reference
Apigene's MCP Gateway provides comprehensive tools for dynamic tool management:
Discovery Tools
list_actions: Retrieve actions from one or multiple applications with summary or full detail levelssearch_actions: Search across all actions using natural language querieslist_available_apps: Discover available applications (Agent mode)
Execution Tools
run_action: Execute a single actionrun_action_batch: Execute the same action multiple times in parallelrun_multi_actions: Execute different actions simultaneously across multiple apps
Agent Mode Tools
get_instructions: Access agent instructions and capabilitieslist_contexts: List available contextssearch_contexts: Search for specific contextsget_context: Get detailed context informationadd_context: Create new contexts
Best Practices for Dynamic Tool Loading
1. Always Start with Summary Mode
When discovering tools, use detail_level: "summary" to minimize token usage:
{
"tool": "list_actions",
"arguments": {
"requests": [
{
"app_name": "Slack",
"detail_level": "summary"
}
]
}
}2. Load Full Schemas Only When Needed
After identifying relevant actions, load full parameter schemas:
{
"tool": "list_actions",
"arguments": {
"requests": [
{
"app_name": "Slack",
"operationIds": ["chat.postMessage"]
}
]
}
}3. Use Search for Cross-App Discovery
When you're not sure which application provides a capability:
{
"tool": "search_actions",
"arguments": {
"query": "send notification",
"detail_level": "summary"
}
}4. Leverage Response Projection
For large datasets, use response_projection to reduce response size:
{
"tool": "run_action",
"arguments": {
"app_name": "Jira",
"user_input": "List open issues",
"context": {
"operationId": "listIssues"
},
"response_projection": "issues[*].{id: id, key: key, summary: fields.summary}"
}
}Comparison: Traditional vs. Dynamic Loading
| Metric | Traditional Loading | Apigene Dynamic Loading | Improvement |
|---|---|---|---|
| Initial Context Size | 150,000 tokens | 0 tokens | 100% reduction |
| Discovery Phase | N/A | 300-500 tokens | - |
| Detail Loading | All tools upfront | Selected tools only | 98% reduction |
| Total for Simple Task | 150,700 tokens | 2,700 tokens | 98.2% reduction |
| Scalability | Linear with tool count | Constant | Infinite |
Getting Started with Apigene's Dynamic Tool Loading
1. Connect to MCP Gateway
Use either MCP Mode or Agent Mode:
- MCP Mode:
https://app.apigene.ai/mcp/{genai_app}/mcp - Agent Mode:
https://app.apigene.ai/agent/{genai_app}/mcp
2. Start with Discovery
Begin by discovering available applications and actions:
{
"tool": "list_available_apps",
"arguments": {
"include_action_summaries": true,
"max_action_summary": 10
}
}3. Search for Relevant Tools
Use semantic search to find tools:
{
"tool": "search_actions",
"arguments": {
"query": "your use case",
"detail_level": "summary"
}
}4. Load and Execute
Get full details and execute:
{
"tool": "list_actions",
"arguments": {
"requests": [
{
"app_name": "YourApp",
"operationIds": ["specificAction"]
}
]
}
}Then execute with run_action, run_action_batch, or run_multi_actions.
Conclusion
As AI agents scale to connect with hundreds or thousands of tools, traditional approaches that load all tool definitions upfront become prohibitively expensive. Apigene's dynamic tool loading solves this challenge through progressive disclosure, allowing agents to discover and load tools on-demand.
By reducing token consumption by up to 98%, dynamic tool loading makes it economically feasible to build agents with extensive tool ecosystems while maintaining fast response times and full MCP compatibility.
The key insight is simple: don't load what you don't need. Start with lightweight discovery, load details only for selected tools, and execute efficiently. This approach scales infinitely—whether you have 10 tools or 10,000, the initial cost remains constant.
Ready to optimize your MCP agent's token usage? Get started with Apigene's MCP Gateway and experience the power of dynamic tool loading.
Learn More: