MCP Streamable HTTP Transport: From SSE Migration to Production

If you've deployed an MCP server on Cloud Run or Lambda, you've probably watched SSE connections die mid-conversation. One engineering lead put it bluntly: "SSE kept dropping connections on Cloud Run. Serverless auto-scaling kills them randomly." Teams have been chaining two or three proxies just to keep MCP tools stable. That's not a production architecture, it's a workaround.
MCP Streamable HTTP fixes this. It's the transport layer that turns MCP from a local development toy into something you can actually deploy, scale, and debug in production. And if you're building AI agents that call remote tools, it's the transport you should be using right now.
For busy engineering leads building AI agents, here's what 31 developer discussions taught us:
- SSE is officially deprecated for MCP. Streamable HTTP replaced it in the March 2025 spec revision, and every major client is migrating.
- Serverless + SSE never worked reliably. Developers report dropped connections, session desync, and cold start failures across Cloud Run, Lambda, and Azure Functions.
- Transport choice is a production deployment decision, not a development-time preference. The wrong transport breaks auth, routing, and multi-user support.
- Gateway adoption is accelerating because teams are tired of managing auth, routing, and observability across fragmented MCP servers.
What Is MCP Streamable HTTP?
MCP Streamable HTTP is the modern transport protocol for the Model Context Protocol that uses standard HTTP POST and GET requests to connect AI agent clients with remote tool servers. It replaces the deprecated SSE transport, supports both stateless and stateful sessions, and works natively with serverless infrastructure, load balancers, and API gateways.
That definition covers the "what." Here's how it actually works. If you're searching for what is streamable HTTP in MCP, it comes down to replacing persistent connections with standard HTTP requests.
The MCP protocol defines how AI agents discover and call external tools. But the protocol itself doesn't specify how messages travel between the MCP server vs client. That's the transport layer's job. Before March 2025, MCP offered two transports: stdio (for local processes) and SSE (for remote servers). Streamable HTTP replaced SSE as the recommended remote transport.
How It Differs from a Regular REST API
A standard REST API handles one request and one response. MCP Streamable HTTP does more. The client sends JSON-RPC messages via POST to a single endpoint (typically /mcp), and the server can respond with either a single JSON response or open an SSE stream for multiple messages. This flexibility means simple tool calls get fast single-response handling, while long-running operations like progress updates or streaming results use SSE streaming on the same connection.
The mcp streamable http transport also supports an optional Mcp-Session-Id header. Servers can assign a session ID during initialization, and clients include it in subsequent requests. This makes the transport work in both stateless mode (each request is independent, perfect for serverless) and stateful mode (sessions persist across requests, useful for caching or multi-step workflows).
So is streamable HTTP stateful? Both. The spec supports stateless and stateful operation, and the server decides which mode to use. That's one of the key design wins over SSE, which always required a persistent connection.
MCP SSE vs Streamable HTTP: What Changed and Why
SSE worked fine for local development. You'd start an MCP server, connect a single client, and tools would respond over a persistent event stream. But SSE had three structural problems that made it unworkable for production.
First, SSE requires persistent connections. Every client keeps a long-lived HTTP connection open to receive events. Serverless platforms like Cloud Run, Lambda, and Azure Functions kill idle connections aggressively. One developer in r/mcp summed it up: "cold starts wrecked websocket connections, scaling was spotty too. Stuck with VMs right now."
Second, SSE uses two separate channels. The client sends requests via POST, but listens for responses on a separate GET endpoint. That means two connections per client, two things that can break independently, and two things your load balancer needs to route correctly.
Third, SSE can't do multi-user routing cleanly. With a single persistent connection per client, routing requests from different users to different backends requires workarounds. After migrating to Streamable HTTP, one team reported that "multi-user routing via URL params finally works cleanly."
Stop Building MCP Integrations From Scratch.
- Any API, one line of code — connect to ChatGPT, Claude, and Cursor without writing custom MCP servers
- Visual UI in the chat — render interactive components, not just text dumps. Charts, forms, dashboards.
- 70% fewer tokens — dynamic tool loading and output compression so your agents stay fast and cheap
Transport Comparison Table
| Feature | stdio | SSE (deprecated) | Streamable HTTP |
|---|---|---|---|
| Connection type | Process pipes | Persistent HTTP | Request/response |
| Remote support | No (local only) | Yes | Yes |
| Serverless compatible | No | No (drops connections) | Yes |
| Multi-user routing | No | Difficult | Native (URL params) |
| Session management | Implicit | Required | Optional |
| Load balancer friendly | N/A | No (sticky sessions) | Yes |
| Bidirectional comms | Yes (via pipes) | Limited (two channels) | Yes (POST + SSE) |
When comparing mcp streamable http vs sse, the differences aren't subtle. And when you add mcp stdio vs streamable http into the picture, each transport serves a distinct deployment model: stdio for local processes, SSE for the brief era of remote connections, and Streamable HTTP (or mcp http streamable, as some call it) for production.
The MCP spec officially deprecated SSE in the March 2025 revision. If you're still running SSE, SSE is deprecated, and Streamable HTTP replaces it. The migration path is straightforward, and most SDKs (FastMCP, the TypeScript SDK, Spring AI) already support Streamable HTTP natively.
We analyzed over 30 discussions where MCP developers share migration experiences, and the pattern is consistent. Teams that switched from SSE to Streamable HTTP report fewer dropped connections, simpler deployment configs, and the ability to run on serverless without dedicated VMs. One developer captured the before-and-after: "Been running MCP servers on a VM because SSE kept dropping connections on Cloud Run. Just migrated to Streamable HTTP... Each call is independent." The frustration with SSE wasn't subtle. As one engineer in a transport discussion thread put it, the old model "treats AI agents like static web pages from 2001."
How Streamable HTTP Works Under the Hood
Understanding the mechanics helps when you're debugging connection issues or designing your server architecture. Here's what happens during a typical MCP Streamable HTTP session.
Sending Messages to the Server
The client sends all messages (tool calls, initialization, pings) as HTTP POST requests to the server's MCP endpoint. Each POST body contains a JSON-RPC 2.0 message. The server can respond in one of two ways:
- Single response: Returns a JSON object with
Content-Type: application/json. Used for simple tool calls that complete immediately. - Streaming response: Returns
Content-Type: text/event-streamand opens an SSE stream. Used when the server needs to send multiple messages (progress updates, partial results, or server-initiated notifications).
This dual-mode response is what makes the transport "streamable." It's not always streaming, it streams only when needed.
Listening for Server Responses
Clients can also open a GET request to the MCP endpoint to listen for server-initiated messages like notifications or requests. This replaces the dedicated SSE endpoint that the old transport required. The GET connection is optional, so lightweight clients that only make tool calls don't need it.
Because Streamable HTTP enables true remote connections, the server and client can be on completely different networks. There's no requirement for process pipes or persistent connections.
Session Management and State
When a streamable http mcp server initializes, it can optionally include an Mcp-Session-Id header in its response. If it does, the client must include that header in all subsequent requests. This enables stateful features like:
- Tool caching: The server remembers which tools it already advertised
- Context persistence: Multi-step workflows can maintain state
- Connection resumption: Clients can reconnect without re-initializing
If the server omits the session ID, every request is treated independently. This stateless mode is ideal for serverless deployments where instances may scale to zero between requests.
Which MCP Clients Support Streamable HTTP?
Not every MCP client has caught up with the spec. Before you build a streamable http mcp server, check that your target clients actually support the transport. Here's where things stand as of early 2026.
Claude Desktop and Claude Code
Anthropic's clients were among the first to support Streamable HTTP natively. Claude Desktop handles both Streamable HTTP and SSE with automatic fallback. Claude Code supports Streamable HTTP for remote servers out of the box. One important caveat from community reports: OAuth for remote MCP connections may be restricted to Work/Enterprise accounts on Claude.ai, which blocks some developers from testing.
Cursor, Windsurf, and IDE Clients
Cursor added Streamable HTTP support in late 2025. Windsurf and other IDE-based clients vary in their transport support, so check their docs for the latest. The general trend is clear: clients need Streamable HTTP support for remote servers, and most are shipping it.
Open WebUI and Self-Hosted Options
Open WebUI added native Streamable HTTP support in v0.6.31. Before that, teams used MCPO or MetaMCP as bridges. Community sentiment on these bridges is mixed. As one developer noted, "Mcpo is a life saver here. I have more than 9 MCPs running through it." But another warned: "We made mcpo working with custom servers. But it's quite fragile."
One recurring pain point: Open WebUI doesn't natively support custom HTTP headers on MCP connections, which blocks enterprise auth patterns that rely on Bearer tokens or custom x-headers. Teams working around this use a gateway layer to inject headers before traffic reaches the MCP server.
We tracked mcp client streamable http compatibility across 31 threads, and the pattern is clear. Support exists in the major clients but edge cases trip people up constantly.
| Client | Streamable HTTP | SSE Fallback | OAuth Support | Notes |
|---|---|---|---|---|
| Claude Desktop | Yes | Yes | Work accounts | Full support since late 2025 |
| Claude Code | Yes | Yes | Basic auth | CLI-native |
| Cursor | Yes | Yes | API keys | Added late 2025 |
| Open WebUI | Yes (v0.6.31+) | Via MCPO | Limited | No custom headers natively |
| LangChain/LangGraph | Yes | Yes | Configurable | Since May 2025 |
| n8n | Partial | Via bridge | API keys | Community integrations |
Building a Streamable HTTP MCP Server
The fastest way to create mcp server endpoints with Streamable HTTP support depends on your stack. Here are working patterns for the three most popular frameworks.
Python with FastMCP
FastMCP is the most popular Python SDK for MCP servers. It supports Streamable HTTP out of the box:
from fastmcp import FastMCP
mcp = FastMCP("my-tools")
@mcp.tool()
def search_docs(query: str) -> str:
"""Search documentation by keyword."""
return f"Results for: {query}"
# Run with Streamable HTTP transport
mcp.run(transport="streamable-http", host="0.0.0.0", port=8000)This starts a server on http://localhost:8000/mcp that accepts POST requests for tool calls and GET requests for server-initiated messages. You can test it immediately with the MCP Inspector or any Streamable HTTP client.
For a fastapi mcp streamable http setup, FastMCP can also mount directly into an existing FastAPI application. And if you're working in python mcp streamable http, FastMCP is the path of least resistance to get a server running.
Explore 251+ MCP Integrations
Discover official and remote-only MCP servers from leading vendors. Connect AI agents to powerful tools and services.
TypeScript with the MCP SDK
The official MCP TypeScript SDK provides a StreamableHTTPServerTransport class:
import { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import { StreamableHTTPServerTransport } from "@modelcontextprotocol/sdk/server/streamableHttp.js";
const server = new McpServer({ name: "my-tools", version: "1.0.0" });
server.tool("search_docs", { query: "string" }, async ({ query }) => ({
content: [{ type: "text", text: `Results for: ${query}` }]
}));
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);Setting sessionIdGenerator: undefined runs the server in stateless mode. For stateful sessions, pass a function that generates unique session IDs.
Spring AI (Java)
Spring AI added MCP Streamable HTTP server support through a dedicated starter. Add spring-ai-mcp-server-streamable-http-spring-boot-starter to your dependencies, and Spring auto-configures the transport. This is particularly useful for teams already running Java microservices who want to expose tools to AI agents without switching stacks.
Each of these frameworks handles the transport layer so you can focus on writing tool logic. Pick the one that matches your existing stack and you can have a working mcp streamable http example running in under 10 minutes.
Testing and Debugging
Using MCP Inspector
The MCP Inspector is the standard debugging tool for MCP servers. Point it at your Streamable HTTP endpoint and it'll show you available tools, send test calls, and display raw JSON-RPC messages. For mcp inspector streamable http connections, set the transport type to "Streamable HTTP" and enter your server URL (e.g., http://localhost:8000/mcp).
Common Errors and Fixes
Based on community reports, these are the errors that trip up most developers:
"Session not found" errors: This usually means the server assigned a session ID but the client lost it (common after workspace reloads in Claude Code or Cursor). Fix: either run in stateless mode (omit session IDs) or implement session resumption logic.
POST returns 200 but tool doesn't execute: The JSON-RPC response succeeded, but the tool call failed silently. Check that your tool function actually returns a properly formatted MCP response, not just a raw value.
CORS errors on browser-based clients: If you're connecting from a web-based client like Open WebUI, your server needs to return appropriate CORS headers. Most MCP SDKs don't set these by default.
Cloudflare/WAF blocking requests: Community reports confirm that Cloudflare's "Block AI Bots" setting can intercept Claude's MCP requests. If your server sits behind a WAF, allowlist the MCP client's user agent or disable bot protection for the /mcp endpoint.
So how to test streamable http mcp server setups end-to-end? Start with the MCP Inspector for basic validation, then test with your target client (Claude, Cursor, or your custom agent) to catch transport-specific issues.
Deploying Streamable HTTP MCP Servers to Production
This is where transport selection is a production deployment decision, not just a technical preference. Your deployment platform determines which transport patterns work and which ones fail.
Serverless (Cloud Run, Lambda, Azure Functions)
Streamable HTTP was designed for this. Because each request is independent in stateless mode, serverless platforms can spin instances up and down without breaking MCP connections. One developer on r/mcp confirmed: "Stateless MCP servers run fine on Lambda or Cloud Run once you sort the transport layer."
Practical tips from production deployments:
- Cloud Run: Set minimum instances to 1 to avoid cold starts. Works well with Streamable HTTP. You can deploy on AWS/Azure with Streamable HTTP using similar patterns.
- Lambda: Use AWS Lambda Web Adapter to handle HTTP routing. API Gateway in front handles TLS and auth.
- Cold starts: If your tools load large models or datasets, cold starts will add 2-5 seconds to the first request. Consider keeping one warm instance.
Containers and Docker
For teams that need stateful sessions or want full control over infrastructure, containerized Streamable HTTP with Docker is the simplest path. Run your MCP server in a Docker container, put it behind a reverse proxy (Nginx, Traefik, or Caddy), and you get TLS, logging, and health checks for free.
ECS Fargate on AWS or Cloud Run on GCP both work well. One team reported their ECS bill at "under three dollars" monthly for a production MCP server.
Using an MCP Gateway
When you're running multiple MCP servers with different auth requirements and need centralized observability, a gateway makes sense. We analyzed 31 developer discussions where teams shared their gateway experiences, and the pattern is clear: teams start by connecting MCP servers directly, then hit auth fragmentation and debugging pain, then adopt a gateway.
Apigene is an MCP gateway that connects any API or MCP server to AI agents in one step. It handles transport negotiation (Streamable HTTP and SSE), dynamic tool loading, and renders full UI components inside ChatGPT and Claude, not just text responses. For teams building AI agent products, it eliminates the need to manage transport, auth, and routing across individual MCP servers.
Other options in the gateway space include ContextForge (open source, plugin-based), Docker MCP Gateway, and Portkey (focused on observability). But as one enterprise user managing 3,000 daily users noted, "Gateway auth + audit log is where most MCP deployments actually break down." The right gateway abstracts transport translation between Streamable HTTP, SSE, and stdio so your agent code doesn't need to care which transport the underlying server uses.
"Don't build transport handling into your agent code. Your agent should call tools, not manage HTTP sessions. Push transport, auth, and routing into a gateway layer, and your agent stays clean no matter how many MCP servers you connect. The teams we work with that scale fastest are the ones that separate tool logic from infrastructure plumbing on day one."
The Bottom Line
MCP Streamable HTTP isn't optional anymore. It's the production transport for remote MCP servers, and it solves real problems that SSE couldn't: serverless compatibility, multi-user routing, and clean session management. The community data backs this up. Teams that migrate report fewer dropped connections, simpler deployments, and the ability to scale without dedicated VMs.
If you're building AI agents that call remote tools, start with Streamable HTTP from day one. Pick a framework (FastMCP for Python, the TypeScript SDK, or Spring AI for Java), deploy on serverless or containers, and use a gateway like Apigene when you need to manage multiple servers at scale.
Stop Building MCP Integrations From Scratch.
- Any API, one line of code — connect to ChatGPT, Claude, and Cursor without writing custom MCP servers
- Visual UI in the chat — render interactive components, not just text dumps. Charts, forms, dashboards.
- 70% fewer tokens — dynamic tool loading and output compression so your agents stay fast and cheap
Frequently Asked Questions
What is MCP Streamable HTTP?
MCP Streamable HTTP is the recommended transport protocol for connecting AI agent clients with remote MCP (Model Context Protocol) tool servers. It uses standard HTTP POST for sending JSON-RPC messages and optional SSE streaming for server responses. It replaced the deprecated SSE transport in the March 2025 MCP spec revision. Streamable HTTP works with serverless platforms, load balancers, and API gateways because each request can be handled independently.
Is Streamable HTTP stateful or stateless?
Both. The MCP Streamable HTTP spec supports stateless and stateful operation, and the server decides which mode to use. In stateless mode, every request is independent, which is ideal for serverless deployments on Cloud Run or Lambda. In stateful mode, the server assigns an Mcp-Session-Id header that the client includes in subsequent requests, enabling features like tool caching and multi-step workflows. Most production deployments start stateless and add sessions only when needed.
How do I test a Streamable HTTP MCP server?
Start with the MCP Inspector. Set the transport to "Streamable HTTP," enter your server URL, and run test tool calls. The Inspector shows raw JSON-RPC request/response pairs so you can verify message formatting. For end-to-end testing, connect from your target client (Claude Desktop, Cursor, or a custom agent) and check for session handling, error responses, and streaming behavior. Community developers report that "Session not found" errors and CORS issues are the most common problems, both fixable with configuration changes.
Can I run Streamable HTTP MCP on serverless without session drops?
Yes, and this is exactly why Streamable HTTP exists. Run your server in stateless mode (omit session IDs) and each request is independent. Cloud Run, Lambda, and Azure Functions all handle this well. Based on developer reports across 12 production deployment threads, the key tips are: set minimum instances to 1 on Cloud Run to avoid cold starts, use AWS Lambda Web Adapter for Lambda deployments, and keep tool initialization lightweight. Teams consistently report that Streamable HTTP on serverless is reliable once configured correctly, unlike SSE which dropped connections unpredictably during autoscaling.
Should I migrate from SSE to Streamable HTTP right now?
Yes. SSE is officially deprecated in the MCP specification since March 2025. All major MCP SDKs (FastMCP, the TypeScript SDK, Spring AI) already support Streamable HTTP, and most clients handle it natively. The migration is typically straightforward: swap your transport configuration and test with the MCP Inspector. Developers who've made the switch report that the hardest part isn't the code change but updating deployment configs (load balancer settings, health checks, CORS headers). The longer you wait, the more likely you'll hit SSE-specific bugs that won't get fixes.
What happens when my MCP client doesn't support Streamable HTTP?
Most modern clients support Streamable HTTP, but if you're stuck with an older client, you have options. Claude Desktop and Cursor both support SSE as a fallback. For self-hosted clients like Open WebUI (pre-v0.6.31), use a bridge like MCPO or Supergateway that translates between transports. A better long-term solution is an MCP gateway that handles transport negotiation automatically, so your server always speaks Streamable HTTP and the gateway translates for legacy clients. Community reports show that bridge-based approaches are "fragile" for production use, so plan for native support or a proper gateway.