MCP Gateway on AWS, Azure & GCP: Cloud Deployment Guide (2026)

Every major cloud provider now has an MCP gateway story. AWS launched AgentCore Gateway in late 2025. Microsoft shipped its MCP Gateway for Azure Kubernetes. GCP teams are wiring up Cloud Run with IAP for MCP traffic. And developers are hitting real deployment friction on every single one.
Search "aws mcp gateway" and you'll find AgentCore docs. Search "azure mcp gateway" and you'll land on Microsoft's GitHub repo. What you won't find is a single guide that compares all these options side by side, covers the gotchas developers actually run into, or explains when a cloud-agnostic gateway beats locking into one provider.
That's what this guide does.
For busy engineering leads deploying MCP gateways to the cloud, here's what we found across 40+ developer threads and production reports:
- AWS AgentCore Gateway has cold start problems. Multiple teams report "huge delay" during
initializeandlist_toolcalls. One developer measured 8-12 second latency spikes on first invocation after idle periods. - GCP's Identity-Aware Proxy (IAP) breaks standard MCP OAuth flows. IAP injects its own auth layer before traffic reaches your MCP server, and MCP's Dynamic Client Registration doesn't account for that. Teams end up building custom OAuth middleware on Cloud Run.
- Microsoft's MCP Gateway handles session routing natively but locks you into AKS. It's the most mature Kubernetes option, but it's tightly coupled to Azure Kubernetes Service.
- A cloud-agnostic MCP gateway eliminates provider lock-in entirely. Deploy once, connect to servers on any cloud, and stop rebuilding gateway infrastructure every time your stack changes.
What Is a Cloud MCP Gateway?
A cloud MCP gateway sits between your AI agent clients (ChatGPT, Claude, custom agents) and the MCP servers that give those agents access to tools and data. It handles three jobs that get complicated fast when you move from localhost to production: authentication, transport bridging, and session routing.
On localhost, your MCP server connects directly to Claude Desktop over stdio. In the cloud, you need OAuth or IAM auth, HTTP or WebSocket transports, load balancing that respects stateful MCP sessions, and observability to debug failures. That's the gateway's job.
The question isn't whether you need one. It's which one fits your stack.
AWS MCP Gateway: AgentCore Gateway
AWS AgentCore Gateway is Amazon's managed MCP gateway, launched as part of the broader AgentCore platform for building AI agents on AWS. It integrates with Lambda, Bedrock, and IAM for auth.
Stop Building MCP Integrations From Scratch.
- Any API, one line of code — connect to ChatGPT, Claude, and Cursor without writing custom MCP servers
- Visual UI in the chat — render interactive components, not just text dumps. Charts, forms, dashboards.
- 70% fewer tokens — dynamic tool loading and output compression so your agents stay fast and cheap
How It Works
AgentCore Gateway acts as a managed proxy between AI agent clients and MCP servers running on AWS. You register MCP servers (typically Lambda functions or ECS containers), configure IAM-based access policies, and point agents at the AgentCore endpoint.
The gateway handles:
- Tool registration and discovery from Lambda-backed MCP servers
- IAM-based authentication tied to AWS's existing identity infrastructure
- Auto-scaling through Lambda's concurrency model
- CloudWatch integration for logging and metrics
What Developers Actually Experience
The aws mcp gateway documentation looks clean. The ground truth is rougher.
Transport mismatches are the #1 pain point. Teams building with FastMCP (the most popular Python MCP framework) hit compatibility walls on AgentCore. FastMCP defaults to SSE transport, but AgentCore expects a specific Lambda invocation pattern. One Reddit developer spent two days debugging why a working local MCP server returned empty tool lists after deploying to AgentCore: "transport type mismatch" was the culprit.
Cold starts create latency spikes. Lambda-backed MCP servers inherit Lambda's cold start problem, and it's worse for MCP because the protocol requires an initialize handshake before any tool calls. Multiple developers report "huge delay" during initialize/list_tool sequences. One team measured 8-12 seconds for the initial connection after an idle period, which kills interactive agent workflows.
IAM complexity adds friction. Configuring the right IAM roles, resource policies, and cross-account access for MCP servers adds real setup overhead compared to API keys or OAuth.
Best For
Teams deeply invested in AWS who already run Bedrock agents and Lambda functions. If your entire AI stack lives on AWS and you're comfortable with IAM, AgentCore keeps everything in one place.
Cost at Scale
AgentCore follows AWS's consumption model: Lambda invocations, data transfer, and gateway request processing. No flat monthly fee. A team running 100,000 tool calls per day reported $450-600/month for the gateway layer alone, excluding Lambda execution costs.
Microsoft MCP Gateway on Azure Kubernetes
Microsoft's MCP Gateway targets Kubernetes environments on Azure Kubernetes Service (AKS). It's open source on GitHub and provides session-aware routing, lifecycle management, and scalable MCP server orchestration.
How It Works
The microsoft mcp gateway deploys as a Kubernetes operator with Custom Resource Definitions inside your AKS cluster. You define MCP servers as Kubernetes resources, and the gateway handles:
- Session-aware routing that maintains stateful MCP connections across pod replicas
- Lifecycle management for MCP server deployments (scaling, health checks, rolling updates)
- Azure AD integration for enterprise authentication
- Horizontal pod autoscaling based on active MCP session count
What Developers Actually Experience
Session routing is the standout feature. MCP connections are stateful. On vanilla Kubernetes, pods scaling up and down causes dropped connections and state loss. The azure mcp gateway solves this with sticky session routing that correctly maintains client-to-server affinity. Teams that tried running MCP on standard ingress controllers consistently hit this wall.
Kubernetes is a hard requirement. There's no standalone binary or Docker Compose option. A developer asked about running it on a single VM and was told it "requires a Kubernetes cluster." That's a blocker for smaller teams or those on serverless architectures.
Azure AD setup is involved. Teams using Azure AD get seamless SSO for MCP connections, but configuring app registrations, API permissions, and token audience validation took one team "three days of back-and-forth with Microsoft support."
Best For
Enterprise teams running AKS who need production-grade MCP orchestration with strong identity integration. If you're already on AKS with Azure AD, this is the most mature Kubernetes-native option.
Cost at Scale
The gateway is open source and free. You pay for AKS cluster compute. For teams with existing cluster capacity, the incremental cost is minimal. Spinning up a new cluster for MCP starts at $150-200/month for a small production setup.
GCP MCP Gateway: Cloud Run + IAP Challenges
Google Cloud doesn't have a managed MCP gateway product as of March 2026. Teams deploying a gcp mcp gateway are assembling their own from Cloud Run, Identity-Aware Proxy, Cloud Endpoints, and Pub/Sub.
Typical Architecture
Most GCP teams deploy MCP servers as Cloud Run services with a combination of load balancer, IAP, and custom auth middleware:
AI Agent Client -> Cloud Load Balancer -> IAP -> Cloud Run (Gateway) -> Cloud Run (MCP Servers)
What Developers Actually Experience
IAP breaks standard MCP OAuth flows. This is the biggest pain point, and it surfaces repeatedly in developer threads. GCP's Identity-Aware Proxy intercepts all incoming requests and requires its own authentication before traffic reaches your service. MCP's OAuth 2.1 with Dynamic Client Registration doesn't account for a pre-auth layer like IAP.
One developer described the fix: "IAP doesn't fit standard MCP OAuth flows. You need to put an OAuth2/OIDC-aware gateway in front of Cloud Run to handle the MCP auth handshake before IAP sees the request." That means building custom middleware to bridge between MCP's expected auth and GCP's infrastructure auth.
Cloud Run works well for the servers themselves. Once auth is solved, Cloud Run handles scaling, offers per-request billing, and supports both HTTP and WebSocket transports. Cold starts run 1-3 seconds for a Python MCP server with dependencies, better than Lambda's 8-12 second penalty.
No managed session routing. Cloud Run doesn't natively handle MCP session affinity. You'll need to configure session-based routing at the load balancer level or use Cloud Run's session affinity feature (still in preview as of March 2026).
Best For
Teams already on GCP with strong platform engineering capabilities. GCP gives you all the building blocks, but you're assembling the gateway yourself.
Cost at Scale
Cloud Run's per-request billing is the cheapest raw compute. A team running 10,000-50,000 tool calls per day reported $80-150/month for gateway and server layers combined. But factor in 20-40 engineering hours for custom assembly. At $150/hour loaded cost, that's $3,000-6,000 in setup and maintenance that erases the compute savings.
Explore 251+ MCP Integrations
Discover official and remote-only MCP servers from leading vendors. Connect AI agents to powerful tools and services.
IBM and Cloudflare: Emerging Options
Two more providers worth evaluating.
IBM MCP Gateway
IBM ties its ibm mcp gateway into the watsonx AI platform. It provides MCP server management through IBM Cloud's API Connect infrastructure. The standout: built-in audit logging and compliance controls designed for regulated industries (financial services, healthcare). It integrates with IBM Cloud IAM and works best for enterprises already running watsonx agents. Adoption outside the IBM ecosystem is limited so far, but governance features are ahead of other options.
Cloudflare MCP Gateway
The cloudflare mcp gateway runs on Cloudflare's edge network, routing MCP requests to the nearest PoP before they hit your servers. It integrates with Cloudflare Workers for serverless MCP hosting and Cloudflare Access for zero-trust auth. The edge-first approach delivers sub-second cold starts, which is compelling for globally distributed agent deployments where latency matters. Still early, but the architecture is sound.
Apigene: The Cloud-Agnostic Alternative
Here's the problem with cloud-specific gateways: you're coupling AI agent infrastructure to a single provider. Today your agents run on AWS. Next quarter, a client requires Azure. Six months later, you need GCP for a Vertex AI integration.
Cloud-specific gateways don't travel. An AgentCore deployment can't run on Azure. Microsoft's Kubernetes gateway technically works on GKE, but you lose Azure AD integration and the session routing optimizations. GCP's DIY assembly is, by definition, tied to GCP services.
Apigene is a cloud-agnostic MCP gateway that runs on any provider (or on-prem) and connects to MCP servers regardless of where they're hosted. It decouples the gateway layer from the cloud platform layer entirely.
What sets Apigene apart:
- Any cloud, one endpoint. Agents connect to a single gateway URL. Behind it, MCP servers can run on Lambda, AKS, Cloud Run, or bare metal. Apigene handles transport bridging across all of them.
- Auth translation without cloud-specific IAM. Instead of configuring IAM roles, Azure AD app registrations, or IAP middleware, Apigene provides its own auth layer that bridges between your AI clients' OAuth flows and your internal service credentials.
- Rich UI rendering inside ChatGPT and Claude. No cloud-native gateway offers this. Apigene renders interactive UI components (charts, forms, filterable tables) directly inside chat through the MCP Apps standard. Other gateways return raw JSON.
- No-code setup. Connecting a new API or MCP server doesn't require Lambda wrappers, Kubernetes CRDs, or Cloud Run configs. You configure the connection through Apigene's interface and it handles protocol translation.
- Dynamic tool loading and output compression. Only relevant tools surface per session (reducing context bloat), and tool responses compress to cut token costs by up to 70%.
Cloud MCP Gateway Comparison Table
| Factor | AWS AgentCore | Azure MCP Gateway | GCP (DIY) | IBM MCP Gateway | Cloudflare MCP Gateway | Apigene |
|---|---|---|---|---|---|---|
| Setup complexity | Medium (IAM) | High (K8s required) | High (custom assembly) | Medium (watsonx) | Low (Workers) | Low (no-code) |
| Auth model | IAM + Cognito | Azure AD + RBAC | IAP + custom OAuth | IBM Cloud IAM | Cloudflare Access | Built-in OAuth translation |
| Cold start latency | 8-12s (Lambda) | None (persistent pods) | 1-3s (Cloud Run) | Varies | Sub-second (edge) | Sub-second |
| Session routing | Basic | Native sticky sessions | Manual config | Basic | Stateless | Managed |
| Multi-cloud support | No | No (AKS-optimized) | No | No | Edge-agnostic | Yes |
| UI rendering in chat | No | No | No | No | No | Yes (MCP Apps) |
| Pricing model | Per-invocation | Cluster compute | Per-request | Platform license | Per-request | Subscription |
Cost Comparison for 50,000 Daily Tool Calls
| Provider | Estimated Monthly Cost | Notes |
|---|---|---|
| AWS AgentCore | $450-600 | Lambda + gateway processing + data transfer |
| Azure MCP Gateway | $150-300 | AKS compute (assumes existing cluster capacity) |
| GCP (Cloud Run) | $80-150 | Per-request billing, cheapest raw compute |
| Apigene | Subscription-based | Predictable pricing, no per-invocation surprises |
What the Community Is Saying: Cloud MCP Deployment Pain Points
Developer threads across Reddit and GitHub reveal consistent patterns in cloud MCP gateway deployments.
Transport mismatches top the frustration list. FastMCP's default SSE transport doesn't match AgentCore's Lambda pattern. Developers describe spending days debugging "transport type mismatch" errors that produce zero useful logs on the AWS side. The fix involves explicitly setting streamable-http transport and wrapping the server in a Lambda-compatible handler, but the documentation doesn't make this obvious.
Cold starts on cloud-hosted MCP servers affect all providers to different degrees. Lambda is the worst at 8-12 seconds. Cloud Run hits 1-3 seconds. Kubernetes avoids it entirely with persistent pods but costs more. Teams that moved latency-sensitive tools to always-warm instances and kept batch tools on serverless saw the best balance of cost and performance.
Auth is the universal headache. GCP teams wrestle with IAP incompatibility. AWS teams struggle with IAM verbosity. Azure teams spend days on AD app registrations. One common recommendation across threads: "put an OAuth2/OIDC-aware gateway in front of your MCP servers" rather than trying to make each cloud's native auth work with MCP's spec.
Session statefulness catches teams off guard. MCP connections maintain state across tool calls. Standard load balancers route requests round-robin, which breaks MCP sessions. Teams that didn't configure session affinity spent hours debugging intermittent "session not found" errors that only appeared under load.
Nobody's talking about output compression. Most community discussions focus on getting MCP working in the cloud at all. Almost none address the cost of uncompressed tool responses consuming 5,000-15,000 tokens per call. A gateway with built-in response compression can cut LLM costs by 60-80% on tool-heavy workflows.
"We see teams burn weeks migrating MCP gateway infrastructure when they change cloud providers or go multi-cloud. The gateway is a control plane. It shouldn't care where your MCP servers run. Deploy it once, connect it to servers on AWS, Azure, GCP, or on-prem, and stop rebuilding infrastructure every time your stack changes."
The Bottom Line
Every major cloud now supports MCP gateways, but each comes with tradeoffs. AWS AgentCore is the most integrated for Lambda-heavy teams but has the worst cold starts. The microsoft mcp gateway on Azure handles session routing better than anything else but requires Kubernetes. GCP's Cloud Run is the cheapest but demands the most custom engineering.
If you're locked into one cloud and plan to stay there, the native option works. If you need multi-cloud flexibility, want to avoid 20-40 hours of custom infrastructure work, or need features like UI rendering and output compression that no cloud-native gateway offers, Apigene is the cloud-agnostic path that lets you deploy once and connect everywhere.
Stop Building MCP Integrations From Scratch.
- Any API, one line of code — connect to ChatGPT, Claude, and Cursor without writing custom MCP servers
- Visual UI in the chat — render interactive components, not just text dumps. Charts, forms, dashboards.
- 70% fewer tokens — dynamic tool loading and output compression so your agents stay fast and cheap
Frequently Asked Questions
It depends on your existing stack. AWS AgentCore is the most integrated for teams already on Lambda and Bedrock, but it comes with 8-12 second cold starts. Azure's MCP Gateway is the most mature for Kubernetes workloads. GCP offers the cheapest compute but requires custom assembly. For multi-cloud support or to avoid lock-in, a cloud-agnostic gateway like Apigene deploys to any provider and connects to MCP servers regardless of where they run.
FastMCP defaults to SSE transport, which doesn't match AgentCore's Lambda invocation pattern. Configure your FastMCP server to use streamable-http transport and wrap it in a Lambda-compatible handler. You can also deploy on ECS instead of Lambda to keep SSE compatibility. Check the AgentCore docs for the latest transport requirements, as AWS has been updating these since launch.
IAP intercepts all incoming requests and requires Google-issued authentication before traffic reaches your service. MCP's OAuth 2.1 with Dynamic Client Registration expects to handle its own auth handshake directly. When IAP sits in front, the MCP client's OAuth discovery and token exchange get intercepted before reaching your server. The workaround is deploying an OAuth-aware proxy (like oauth2-proxy) as a sidecar on Cloud Run, or skipping IAP entirely and handling auth at the application layer.
For 50,000 daily tool calls, expect roughly $450-600/month on AWS (Lambda + AgentCore processing), $150-300/month on Azure (AKS compute with existing capacity), and $80-150/month on GCP (Cloud Run per-request billing). These are compute costs only. Factor in engineering time: AWS is medium complexity, Azure requires Kubernetes expertise, and GCP's DIY approach demands 20-40 hours of custom work. Apigene offers predictable subscription pricing regardless of traffic.
Only with a cloud-agnostic gateway. Cloud-native options like AgentCore and Azure's MCP Gateway connect to servers within their own ecosystem. A cloud-agnostic gateway like Apigene connects to MCP servers on AWS, Azure, GCP, or on-prem through a single endpoint. This is the recommended pattern for teams with multi-cloud infrastructure or servers spread across providers.
AWS AgentCore (Lambda-backed) is the worst at 8-12 seconds for the initial MCP initialize handshake after idle periods. GCP Cloud Run runs 1-3 seconds with standard container images. Azure's Kubernetes gateway has zero cold starts because pods stay persistent. Cloudflare's edge approach delivers sub-second. To cut Lambda cold starts, you can use provisioned concurrency at $30-80/month per warm function depending on memory config.