insights

From Sequential to Parallel: How Apigene's Parallel Tool Execution Accelerates AI Agents by 10x

Apigene Team
10 min read
From Sequential to Parallel: How Apigene's Parallel Tool Execution Accelerates AI Agents by 10x

AI agents are transforming how we interact with software, but there's a hidden bottleneck slowing them down: sequential tool execution. When agents execute tools one after another, they create artificial delays that compound with each operation, turning simple workflows into slow, frustrating experiences.

At Apigene, we've solved this challenge with parallel tool execution—enabling agents to run multiple actions simultaneously across different applications. This article explores the sequential execution problem, our parallel execution solution, and how it accelerates agent workflows by up to 10x.

The Problem: Sequential Execution Creates Bottlenecks

The Sequential Execution Trap

Most AI agents execute tools sequentially, waiting for each operation to complete before starting the next one. Consider a common workflow:

User Request: "Get my latest email from Gmail, check my open Jira issues, and send a summary to Slack"

Sequential Execution:

1. Execute Gmail action → Wait 500ms → Complete
2. Execute Jira action → Wait 800ms → Complete  
3. Execute Slack action → Wait 300ms → Complete
Total Time: 1,600ms (1.6 seconds)

Each operation blocks the next one, creating a waterfall effect where total execution time equals the sum of all individual operations.

Real-World Impact

The sequential execution problem becomes more severe as workflows grow in complexity:

Example: Reading 10 Emails Sequentially

Email 1: 500ms
Email 2: 500ms
Email 3: 500ms
...
Email 10: 500ms
Total: 5,000ms (5 seconds)

Example: Multi-App Dashboard Update

Fetch Gmail unread count: 400ms
Fetch Jira open issues: 600ms
Fetch Slack unread messages: 300ms
Fetch Salesforce opportunities: 800ms
Fetch GitHub pull requests: 500ms
Total: 2,600ms (2.6 seconds)

In both cases, the agent spends most of its time waiting rather than executing, creating poor user experiences and inefficient resource utilization.

The Hidden Costs

Sequential execution creates several hidden problems:

  1. Increased Latency: Total time equals sum of all operations
  2. Poor User Experience: Users wait unnecessarily long
  3. Higher Costs: More LLM reasoning cycles between operations
  4. Reduced Throughput: Agents can't handle multiple requests efficiently
  5. Timeout Risks: Long sequential chains may exceed timeout limits

The Solution: Parallel Tool Execution

Apigene's MCP Gateway provides three powerful tools for parallel execution, each optimized for different use cases:

1. run_action - Single Action Execution

The foundation tool for executing individual actions. While not parallel itself, it's the building block for parallel operations.

Use Case: Execute a single action when you only need one operation.

Example:

{
  "tool": "run_action",
  "arguments": {
    "app_name": "Slack",
    "user_input": "Send a message to #general",
    "context": {
      "operationId": "chat.postMessage",
      "channel": "#general",
      "text": "Hello, world!"
    }
  }
}

Execution Time: ~300ms

2. run_action_batch - Parallel Batch Execution

Execute the same action multiple times in parallel with different parameters. Perfect for processing multiple items of the same type.

Use Case: When you need to execute the same action multiple times with different inputs (e.g., reading multiple emails, updating multiple records, fetching multiple users).

How It Works:

  • Takes a base_context with shared parameters
  • Takes a batch_context array with varying parameters
  • Executes all batch items in parallel using Promise.all()
  • Returns results in the same order as input

Example: Reading 10 Emails in Parallel

{
  "tool": "run_action_batch",
  "arguments": {
    "app_name": "Gmail",
    "user_input": "Read multiple emails",
    "base_context": {
      "operationId": "readEmail",
      "format": "full"
    },
    "batch_context": [
      { "email_id": "msg_001" },
      { "email_id": "msg_002" },
      { "email_id": "msg_003" },
      { "email_id": "msg_004" },
      { "email_id": "msg_005" },
      { "email_id": "msg_006" },
      { "email_id": "msg_007" },
      { "email_id": "msg_008" },
      { "email_id": "msg_009" },
      { "email_id": "msg_010" }
    ]
  }
}

Execution Time: ~500ms (vs 5,000ms sequential)
Speed Improvement: 10x faster

Response:

{
  "batch_results": [
    {
      "success": true,
      "index": 0,
      "merged_context": {
        "operationId": "readEmail",
        "format": "full",
        "email_id": "msg_001"
      },
      "result": {
        "status_code": 200,
        "response_content": {
          "id": "msg_001",
          "subject": "Meeting Tomorrow",
          "from": "colleague@example.com"
        }
      }
    },
    // ... 9 more results
  ],
  "summary": {
    "total": 10,
    "successful": 10,
    "failed": 0
  }
}

3. run_multi_actions - Parallel Multi-Action Execution

Execute multiple different actions simultaneously across different applications. Perfect for workflows that need data from multiple sources.

Use Case: When you need to execute different actions from different apps in parallel (e.g., fetching data from multiple sources, updating multiple systems simultaneously).

How It Works:

  • Takes an array of action requests
  • Each action can target a different app
  • Executes all actions in parallel using Promise.all()
  • Returns results with success/failure status for each

Example: Multi-App Dashboard Update

{
  "tool": "run_multi_actions",
  "arguments": {
    "actions": [
      {
        "app_name": "Gmail",
        "user_input": "Get unread email count",
        "context": {
          "operationId": "getUnreadCount"
        }
      },
      {
        "app_name": "Jira",
        "user_input": "List my open issues",
        "context": {
          "operationId": "listIssues",
          "assignee": "me",
          "status": "open"
        }
      },
      {
        "app_name": "Slack",
        "user_input": "Get unread message count",
        "context": {
          "operationId": "getUnreadCount"
        }
      },
      {
        "app_name": "Salesforce",
        "user_input": "Get high-value opportunities",
        "context": {
          "operationId": "listOpportunities",
          "minAmount": 10000
        }
      },
      {
        "app_name": "GitHub",
        "user_input": "List open pull requests",
        "context": {
          "operationId": "listPullRequests",
          "state": "open"
        }
      }
    ]
  }
}

Execution Time: ~800ms (vs 2,600ms sequential)
Speed Improvement: 3.25x faster

Response:

{
  "results": [
    {
      "success": true,
      "index": 0,
      "request": {
        "app_name": "Gmail",
        "context": { "operationId": "getUnreadCount" }
      },
      "result": {
        "status_code": 200,
        "response_content": { "unread_count": 5 }
      }
    },
    {
      "success": true,
      "index": 1,
      "request": {
        "app_name": "Jira",
        "context": { "operationId": "listIssues" }
      },
      "result": {
        "status_code": 200,
        "response_content": {
          "issues": [
            {"id": "PROJ-123", "summary": "Fix bug"},
            {"id": "PROJ-124", "summary": "Add feature"}
          ]
        }
      }
    },
    // ... 3 more results
  ],
  "summary": {
    "total": 5,
    "successful": 5,
    "failed": 0
  }
}

Performance Comparison: Sequential vs Parallel

ScenarioSequential TimeParallel TimeImprovement
Read 10 emails5,000ms500ms10x faster
Multi-app dashboard (5 apps)2,600ms800ms3.25x faster
Update 50 records25,000ms2,500ms10x faster
Fetch 3 different data sources1,600ms600ms2.67x faster

Why Parallel Execution is Faster

Parallel execution leverages the asynchronous nature of API calls:

  1. Network I/O Overlap: While one API call waits for network response, others can start
  2. Independent Operations: Most tool calls don't depend on each other
  3. Concurrent Processing: Server can handle multiple requests simultaneously
  4. Reduced Round-Trips: Fewer LLM reasoning cycles between operations

Real-World Use Cases

Use Case 1: Email Processing Pipeline

Scenario: Process 20 emails, extract key information, and update a database.

Sequential Approach:

1. Read email 1 → 500ms
2. Read email 2 → 500ms
...
20. Read email 20 → 500ms
21. Process all emails → 2,000ms
22. Update database → 800ms
Total: 12,800ms (12.8 seconds)

Parallel Approach:

1. Read all 20 emails in parallel → 500ms (run_action_batch)
2. Process all emails → 2,000ms
3. Update database → 800ms
Total: 3,300ms (3.3 seconds)

Improvement: 3.9x faster

Use Case 2: Multi-Source Data Aggregation

Scenario: Build a dashboard with data from 5 different applications.

Sequential Approach:

1. Fetch Gmail data → 400ms
2. Fetch Jira data → 600ms
3. Fetch Slack data → 300ms
4. Fetch Salesforce data → 800ms
5. Fetch GitHub data → 500ms
Total: 2,600ms (2.6 seconds)

Parallel Approach:

1. Fetch all data in parallel → 800ms (run_multi_actions)
Total: 800ms (0.8 seconds)

Improvement: 3.25x faster

Use Case 3: Bulk Record Updates

Scenario: Update 100 customer records in Salesforce.

Sequential Approach:

Update record 1 → 300ms
Update record 2 → 300ms
...
Update record 100 → 300ms
Total: 30,000ms (30 seconds)

Parallel Approach:

Update all 100 records in parallel → 3,000ms (run_action_batch)
Total: 3,000ms (3 seconds)

Improvement: 10x faster

Advanced Features

Error Handling in Parallel Execution

Both run_action_batch and run_multi_actions provide robust error handling:

Partial Success Handling:

{
  "batch_results": [
    { "success": true, "index": 0, "result": {...} },
    { "success": false, "index": 1, "error": "Record not found" },
    { "success": true, "index": 2, "result": {...} }
  ],
  "summary": {
    "total": 3,
    "successful": 2,
    "failed": 1
  }
}

Failed operations don't block successful ones—each execution is independent.

Response Projection for Large Batches

Use response_projection to reduce response size when processing large batches:

{
  "tool": "run_action_batch",
  "arguments": {
    "app_name": "Salesforce",
    "user_input": "Get opportunity summaries",
    "base_context": {
      "operationId": "listOpportunities"
    },
    "batch_context": [
      { "accountId": "001" },
      { "accountId": "002" },
      { "accountId": "003" }
    ],
    "response_projection": "opportunities[*].{name: name, amount: amount, stage: stageName}"
  }
}

This extracts only needed fields, reducing token usage and improving performance.

Large Batch Handling

For batches exceeding 128KB response size, Apigene automatically handles truncation:

{
  "summary": {
    "total": 1000,
    "successful": 1000,
    "failed": 0,
    "truncated": true,
    "returned_count": 500,
    "omitted_count": 500,
    "returned_indices": [0, 1, 2, ..., 499],
    "omitted_indices": [500, 501, 502, ..., 999],
    "next_action": "Use this batch_context to fetch remaining items..."
  }
}

The system provides exact instructions for fetching remaining items in subsequent calls.

Best Practices for Parallel Execution

1. Use run_action_batch for Same-Action Scenarios

When executing the same action multiple times with different parameters:

{
  "tool": "run_action_batch",
  "arguments": {
    "app_name": "Gmail",
    "base_context": { "operationId": "readEmail" },
    "batch_context": [
      { "email_id": "123" },
      { "email_id": "456" }
    ]
  }
}

2. Use run_multi_actions for Different Actions

When executing different actions from different apps:

{
  "tool": "run_multi_actions",
  "arguments": {
    "actions": [
      {
        "app_name": "Gmail",
        "context": { "operationId": "getUnreadCount" }
      },
      {
        "app_name": "Jira",
        "context": { "operationId": "listIssues" }
      }
    ]
  }
}

3. Combine Parallel Execution with Dynamic Tool Loading

Use parallel execution together with dynamic tool loading for maximum efficiency:

  1. Discover tools with list_actions (summary mode)
  2. Execute multiple actions in parallel with run_multi_actions
  3. Process results efficiently

4. Handle Errors Gracefully

Always check the summary object for success/failure counts:

{
  "summary": {
    "total": 10,
    "successful": 8,
    "failed": 2
  }
}

5. Use Response Projection for Large Datasets

Reduce response size with JMESPath projections:

{
  "response_projection": "items[*].{id: id, name: name}"
}

Performance Metrics

Latency Reduction

Operation CountSequential TimeParallel TimeReduction
5 operations2,500ms600ms76%
10 operations5,000ms800ms84%
20 operations10,000ms1,200ms88%
50 operations25,000ms2,500ms90%

Cost Savings

Parallel execution reduces LLM reasoning cycles:

  • Sequential: 1 reasoning cycle per operation = 10 cycles for 10 operations
  • Parallel: 1 reasoning cycle for all operations = 1 cycle total
  • Savings: 90% reduction in LLM calls

Implementation Details

How Apigene Implements Parallel Execution

Apigene's parallel execution uses JavaScript's Promise.all() to execute multiple operations concurrently:

// run_action_batch implementation
const batchPromises = batch_context.map(async (batchItem, index) => {
  const mergedContext = { ...base_context, ...batchItem }
  return await executeAction(app_name, mergedContext)
})
 
const results = await Promise.all(batchPromises)

This ensures:

  • All operations start simultaneously
  • Independent error handling per operation
  • Results returned in input order
  • Maximum concurrency without blocking

Concurrency Limits

Apigene handles large batches efficiently:

  • No hard limits on batch size
  • Automatic response truncation for >128KB results
  • Efficient memory management
  • Graceful degradation for very large batches

Comparison: Sequential vs Parallel Execution

AspectSequential ExecutionParallel Execution
Total TimeSum of all operationsMax of all operations
LatencyHigh (compounds)Low (overlaps)
User ExperienceSlow, frustratingFast, responsive
LLM CyclesOne per operationOne for all operations
CostHigh (more cycles)Low (fewer cycles)
ThroughputLowHigh
Error HandlingStops on first errorContinues on errors
ScalabilityPoor (linear growth)Excellent (constant)

Getting Started with Parallel Execution

Step 1: Identify Parallelizable Operations

Look for operations that:

  • Don't depend on each other's results
  • Can execute independently
  • Target the same or different applications

Step 2: Choose the Right Tool

  • Same action, different params: Use run_action_batch
  • Different actions: Use run_multi_actions
  • Single action: Use run_action

Step 3: Structure Your Request

For Batch:

{
  "tool": "run_action_batch",
  "arguments": {
    "app_name": "YourApp",
    "base_context": { "operationId": "yourAction" },
    "batch_context": [
      { "param1": "value1" },
      { "param1": "value2" }
    ]
  }
}

For Multi-Actions:

{
  "tool": "run_multi_actions",
  "arguments": {
    "actions": [
      {
        "app_name": "App1",
        "context": { "operationId": "action1" }
      },
      {
        "app_name": "App2",
        "context": { "operationId": "action2" }
      }
    ]
  }
}

Step 4: Handle Results

Check the summary for success/failure counts and process results accordingly:

{
  "summary": {
    "total": 10,
    "successful": 10,
    "failed": 0
  }
}

Real-World Example: Complete Workflow

Let's build a complete workflow that demonstrates parallel execution:

User Request: "Get my unread emails, check my Jira issues, and send a summary to Slack"

Step 1: Discover Actions (Parallel Discovery)

{
  "tool": "run_multi_actions",
  "arguments": {
    "actions": [
      {
        "app_name": "Gmail",
        "user_input": "Get unread emails",
        "context": { "operationId": "listUnreadEmails" }
      },
      {
        "app_name": "Jira",
        "user_input": "List my open issues",
        "context": { "operationId": "listIssues", "assignee": "me" }
      }
    ]
  }
}

Step 2: Process Results and Send Summary

{
  "tool": "run_action",
  "arguments": {
    "app_name": "Slack",
    "user_input": "Send summary",
    "context": {
      "operationId": "chat.postMessage",
      "channel": "#updates",
      "text": "You have 5 unread emails and 3 open Jira issues"
    }
  }
}

Total Time: ~1,100ms (vs 2,400ms sequential)
Improvement: 2.2x faster

Conclusion

Sequential tool execution is a major bottleneck preventing AI agents from reaching their full potential. Apigene's parallel execution capabilities—run_action_batch and run_multi_actions—solve this challenge by enabling agents to execute multiple operations simultaneously.

The benefits are clear:

  • 10x faster execution for batch operations
  • 3-4x faster for multi-app workflows
  • 90% reduction in LLM reasoning cycles
  • Better user experience with responsive agents
  • Lower costs through reduced token usage

By combining parallel execution with dynamic tool loading, Apigene enables agents to scale efficiently—whether processing 10 operations or 10,000, the performance gains remain consistent.

Ready to accelerate your AI agents? Get started with Apigene's MCP Gateway and experience the power of parallel tool execution.


Learn More:

#parallel-execution#batch-processing#concurrent-tools#mcp-gateway#ai-agents#tool-execution#performance-optimization#latency-reduction#agent-efficiency#mcp#model-context-protocol