tutorials

FastMCP 3.0: Build MCP Servers in Python, Fast

Apigene Team
13 min read
FastMCP 3.0: Build MCP Servers in Python, Fast

If you're building AI agents that need to call tools, query databases, or interact with APIs, you've probably hit the same wall: wiring up the Model Context Protocol by hand is tedious. FastMCP exists to fix that. Created by Jeremiah Lowin and maintained by Prefect, FastMCP is now the most popular MCP framework in the Python ecosystem, surpassing 4 million daily downloads as of March 2026.

Key Takeaways

For busy Python developers building MCP servers, here's what 47 community discussions and 8,100+ monthly searches taught us:

  • FastMCP cuts boilerplate to near zero. One decorator turns a Python function into a fully typed MCP tool, with schema generation, validation, and error handling built in.
  • Token bloat is the #1 production pain point. Developers report tool schemas consuming 15,000+ tokens before the agent even starts reasoning. FastMCP 3.1's code mode dropped that to 2,000-3,000 tokens for some teams.
  • OAuth is harder than the server itself. Across 10+ threads, auth integration with legacy identity providers was the most cited friction point after initial setup.
  • You'll outgrow a single server. Teams that start with FastMCP prototypes consistently hit scaling walls around tool count, token overhead, and multi-client auth, which is exactly where an MCP gateway fits.

What Is FastMCP?

FastMCP is a Python framework that abstracts the complexity of the Model Context Protocol into a clean, decorator-based API. Instead of manually implementing JSON-RPC handlers, transport layers, and schema generation, you write standard Python functions and let FastMCP handle the protocol plumbing. It supports tools, resources, and prompts as first-class primitives, and ships with a built-in client for testing and agent integration.

The project started as a community effort to make MCP accessible to Python developers. Prefect adopted and now maintains it. Version 3.0 landed in February 2026 with OAuth support, OpenTelemetry tracing, and a composition model for combining multiple servers. Version 3.1 followed weeks later with code mode, a pattern that collapsed 1,000-tool catalogs into two tools and cut token usage by up to 99%.

So how does MCP vs FastMCP compare? The official modelcontextprotocol/python-sdk gives you low-level control. FastMCP gives you speed. As one developer put it: "FastMCP is really elegant and straight-forward." And many of the top MCP servers are built with FastMCP, because the framework handles schema inference, input validation, and transport negotiation automatically.

Getting Started with FastMCP

Installation

Getting FastMCP running takes one command. Open your terminal and run:

pip install fastmcp

That's it. FastMCP pulls in its dependencies (including the official MCP SDK under the hood) and you're ready to build. If you prefer uv for faster installs:

uv pip install fastmcp

You can verify the install with fastmcp version to check which version you're running. At the time of writing, 3.1.x is the latest stable release available on PyPI.

Your First FastMCP Server

Here's a working FastMCP server in 10 lines of Python:

from fastmcp import FastMCP
 
mcp = FastMCP("My First Server")
 
@mcp.tool()
def add(a: int, b: int) -> int:
    """Add two numbers together."""
    return a + b
 
@mcp.tool()
def greet(name: str) -> str:
    """Greet someone by name."""
    return f"Hello, {name}!"

That's a complete MCP server. FastMCP infers the JSON schema from your type hints, generates tool descriptions from your docstrings, and handles validation automatically. To run it:

fastmcp run server.py

By default, this starts a stdio transport. For HTTP, add --transport streamable-http and FastMCP will serve on port 8000. You can debug your FastMCP server during development using the MCP Inspector, which connects to your running server and lets you test tools interactively.

Stop Building MCP Integrations From Scratch.

  • Any API, one line of code — connect to ChatGPT, Claude, and Cursor without writing custom MCP servers
  • Visual UI in the chat — render interactive components, not just text dumps. Charts, forms, dashboards.
  • 70% fewer tokens — dynamic tool loading and output compression so your agents stay fast and cheap

Your First FastMCP Client

FastMCP also ships a client for connecting to any MCP server:

from fastmcp import Client
 
async with Client("server.py") as client:
    result = await client.call_tool("add", {"a": 2, "b": 3})
    print(result)  # 5

The FastMCP client handles transport detection, session management, and reconnection. You can point it at a local script (stdio), a remote URL (Streamable HTTP), or even compose it with other servers.

Core Building Blocks

FastMCP organizes MCP capabilities into three primitives: FastMCP tools, FastMCP resources, and FastMCP prompts. Understanding each one is essential before building anything non-trivial.

Tools

Tools are the workhorses. They represent actions your agent can take: calling an API, querying a database, running a calculation. Each tool gets a name, a JSON schema (auto-generated from type hints), and a description (pulled from the docstring).

@mcp.tool()
def search_users(query: str, limit: int = 10) -> list[dict]:
    """Search for users matching the query string."""
    return db.search(query, limit=limit)

FastMCP supports complex types (Pydantic models, enums, optional fields) and will generate accurate schemas for all of them.

Resources

Resources expose read-only data to the agent. Think of them as GET endpoints: they return information but don't change state. Useful for configuration, knowledge bases, or contextual data that the agent needs to reason about.

@mcp.resource("config://app-settings")
def get_settings() -> dict:
    """Current application configuration."""
    return load_config()

Prompts

Prompts are reusable instruction templates. They let the server suggest how the agent should approach a task, complete with structured arguments.

@mcp.prompt()
def code_review(code: str, language: str = "python") -> str:
    """Review code for bugs and improvements."""
    return f"Review this {language} code:\n\n{code}"

What Developers Actually Build

We analyzed over 40 discussions where Python developers share their FastMCP projects and pain points. The patterns that emerge are consistent: most teams start with tools, hit scaling friction around tool count, and then explore resources and prompts as optimization levers.

PatternWhat Teams ReportedFrequency
Single-purpose tool servers"Started with 3-5 tools for one API"18 of 47 threads
OpenAPI auto-generation"Every endpoint becomes a tool"12 of 47 threads
Tool consolidation"Collapsing related calls cut my error rate in half"8 of 47 threads
Resource-based context"Moved config to resources, saved tokens"5 of 47 threads
Multi-server composition"Unified server for different ad platforms"4 of 47 threads

The takeaway: start simple with tools, but plan for composition early. Teams that mirror REST endpoints 1:1 into MCP tools consistently report problems. One developer summarized the lesson: "REST APIs are resource-oriented... MCP should be intent-oriented." Build tools around what the agent needs to accomplish, not around your API surface.

FastMCP vs the Official MCP SDK

Python developers building MCP servers face a choice: FastMCP vs MCP official SDK. Here's how they compare:

FeatureFastMCP 3.xOfficial Python SDK
API styleDecorator-based (@mcp.tool())Class-based, lower level
Schema generationAutomatic from type hintsManual or semi-manual
OAuth supportBuilt-in proxy + OIDCBasic
Compositionimport_server, mountManual wiring
Code modeBuilt-in (3.1+)Not available
Client libraryIncludedSeparate
FastMCP documentationgofastmcp.com (comprehensive)SDK docs (minimal)
Community size4M+ daily downloadsLower adoption

The official SDK gives you more control over transport and protocol details. FastMCP trades that control for speed. If you're prototyping or building a server with under 50 tools, FastMCP is the faster path. If you need custom transport implementations or have specific protocol requirements, the official SDK might be worth the extra boilerplate.

One thing to watch: some developers report that the official SDK's quickstart examples can be brittle. "Quickstart examples are not working... STDIO... 'could not infer transport from server.py'" appeared in multiple threads. FastMCP's fastmcp run command sidesteps this by handling transport detection automatically.

Transport and Deployment

Choosing a Transport

FastMCP supports three transport modes, and picking the right one matters for production:

Stdio is the default. It communicates over standard input/output, which works great for local development and IDE integrations (Claude Desktop, Cursor). But stdio doesn't work for remote deployments.

SSE (Server-Sent Events) was the original remote transport for FastMCP SSE connections. It's being deprecated in favor of Streamable HTTP, but some older clients still rely on it. If you need SSE compatibility, FastMCP still supports it.

Streamable HTTP is the current standard for remote MCP and the recommended FastMCP transport for production. It runs over regular HTTP, supports multiplexing, and works behind load balancers and proxies. FastMCP Streamable HTTP is what you should default to for any remote deployment. To choose between SSE, Stdio, and Streamable HTTP transports, the rule is simple: stdio for local, Streamable HTTP for everything else.

# Local development (stdio)
fastmcp run server.py
 
# Remote deployment (Streamable HTTP)
fastmcp run server.py --transport streamable-http --port 8000

Explore 251+ MCP Integrations

Discover official and remote-only MCP servers from leading vendors. Connect AI agents to powerful tools and services.

251 Official ServersUpdated RegularlyVendor Verified

Deploying to Production

Once your server works locally, you'll want to deploy it. FastMCP Cloud is one option for hosted deployment. For self-hosted setups, you can containerize with Docker and deploy to any cloud provider:

FROM python:3.12-slim
RUN pip install fastmcp
COPY server.py .
CMD ["fastmcp", "run", "server.py", "--transport", "streamable-http", "--host", "0.0.0.0"]

For a full walkthrough on hosting options, see how to deploy your FastMCP server remotely. The key considerations are TLS termination, health checks, and auth, which leads to the next section.

Advanced Features

Authentication and OAuth

FastMCP 3.0 introduced built-in OAuth support, and it's one of the most praised additions. Teams running legacy identity providers called the OAuth proxy "a lifesaver for our outdated IDP."

FastMCP authentication works at two levels:

  1. Server-level auth protects the MCP endpoint itself. You configure an OAuth provider, and FastMCP handles token validation on every request.
  2. Upstream auth passthrough forwards the user's credentials to downstream APIs. This is where most teams struggle: "I have tried with bearer tokens but I am not able to get it to work" was a common thread in auth discussions.

The FastMCP middleware system lets you inject custom auth logic, rate limiting, or logging into the request pipeline. For enterprise setups with Keycloak, Okta, or Azure AD, the OAuth proxy pattern is the recommended approach.

OpenAPI Integration

If you have an existing REST API with an OpenAPI spec, FastMCP can generate tools from it automatically:

from fastmcp import FastMCP
 
mcp = FastMCP.from_openapi("https://api.example.com/openapi.json")

This is powerful for prototyping, but be careful: a 1:1 mapping of endpoints to tools creates tool explosion. A 200-endpoint API becomes 200 tools, and LLMs struggle with tool selection accuracy above 25-50 tools. The FastMCP OpenAPI integration works best when paired with filtering to expose only the tools your agent actually needs.

Once your server is running in production, follow these 12 production rules to follow once your server is built to avoid the most common operational issues.

Proxy and Composition

FastMCP's composition model lets you combine multiple servers into a single surface:

from fastmcp import FastMCP
 
main = FastMCP("Gateway")
main.import_server("analytics", analytics_server)
main.import_server("crm", crm_server)

The FastMCP proxy feature can also wrap an existing MCP server with middleware, adding auth, logging, or tool filtering without modifying the original server. This is useful when you don't control the upstream server but need to add a policy layer.

Expert Tip — Yaniv Shani, Founder of Apigene

"The biggest mistake I see teams make with MCP is exposing every tool to every agent. Start with 5-7 high-intent tools per agent, and use dynamic loading to surface the rest only when the conversation context calls for it. You'll cut token costs and improve tool selection accuracy at the same time."

Common Pitfalls and How to Avoid Them

We analyzed 47 threads where developers share production pain points with FastMCP and MCP servers in general. The same problems surface repeatedly, and most are avoidable.

Tool Explosion

The most frequent complaint: teams auto-generate MCP tools from OpenAPI specs and end up with 50, 100, or 200+ tools. LLMs start picking the wrong tool once the catalog exceeds 25-50 items, and token costs spike because every tool schema gets injected into the context window.

One developer reported: "dumping 15k tokens of schema at position 0 wastes your most valuable context slots." Another benchmark found that OpenAI models hit a hard API limit at 128 tools, while accuracy degraded well before that threshold.

The fix is intent-driven tool design. Instead of exposing get_user, get_user_by_id, get_user_by_email, and search_users as four separate tools, consolidate them into one find_user tool with flexible parameters. Teams that did this report cutting error rates in half.

For servers with large tool catalogs, you can reduce token costs with dynamic tool loading so agents only see the tools relevant to the current conversation.

Token Bloat

Even with a reasonable tool count, verbose tool outputs can burn through context windows. A single API response dumping raw JSON can consume thousands of tokens that the agent needs for reasoning.

FastMCP 3.1's code mode addresses this by replacing individual tool calls with a sandboxed Python execution environment. Cloudflare reported a 99.9% reduction in tool-listing tokens using this pattern. Developers in the community confirmed similar results: "My token count decreased from 50k to 2-3k max."

Beyond code mode, you can optimize tool output with compression and caching at the gateway level. Strip unnecessary fields from responses, cache repeated lookups, and paginate large result sets.

Pain PointThreads MentioningCommon Fix
Tool schema bloat (15k+ tokens)15 of 47Code mode, dynamic loading
OpenAPI 1:1 tool mapping12 of 47Intent-driven consolidation
Wrong tool selection8 of 47Reduce catalog to 5-15 tools
OAuth/auth failures10 of 47FastMCP OAuth proxy, gateway auth
Client compatibility gaps6 of 47Test across Claude, Cursor, ChatGPT
Transport confusion (stdio vs HTTP)4 of 47Stdio local, Streamable HTTP remote

FastMCP vs FastAPI-MCP

Developers who already use FastAPI often ask about fastapi-mcp, a separate library that adds MCP endpoints to existing FastAPI applications. Here's how it compares to FastMCP:

AspectFastMCPFastAPI-MCP
Primary purposeBuild MCP servers from scratchAdd MCP to existing FastAPI apps
ApproachStandalone frameworkFastAPI plugin/adapter
Tool definition@mcp.tool() decoratorConverts FastAPI routes to tools
TransportStdio, SSE, Streamable HTTPRuns within FastAPI's HTTP server
Best forNew MCP projects, standalone serversTeams with existing FastAPI services

If you're starting fresh, FastMCP is the better choice because it's purpose-built for MCP. If you already have a FastAPI app with dozens of endpoints and want to expose some as MCP tools, FastAPI-MCP saves you from maintaining two separate services. The FastMCP FastAPI integration discussion on Reddit shows that both approaches have active users, but FastMCP's community is larger.

Scaling FastMCP with a Gateway

FastMCP gets you from zero to a working MCP server in minutes. But as your deployment grows, you'll hit problems that a framework alone can't solve: managing auth across multiple servers, controlling which tools each agent sees, compressing tool output to save tokens, and monitoring tool usage across your organization.

This is where an MCP gateway fits. Apigene sits between your agents and your FastMCP servers, handling dynamic tool loading (so agents only see relevant tools), tool output compression (up to 99% token reduction), and centralized auth. You can connect any FastMCP server to Apigene without modifying your server code.

For teams that want to distribute their FastMCP servers to others, you can publish your server to a marketplace where other developers can discover and connect to it through their preferred gateway or client.

The Bottom Line

FastMCP is the fastest path from "I need an MCP server" to a working, production-ready deployment in Python. Version 3.0 brought OAuth, composition, and OpenTelemetry. Version 3.1 added code mode for massive token savings. The framework handles the protocol complexity so you can focus on building tools that actually solve problems.

Start with pip install fastmcp, build your first server with a few tools, test it with the built-in client, and deploy with Streamable HTTP when you're ready to go remote. When you outgrow a single server, compose multiple servers or route through a gateway to keep token costs and tool sprawl under control.

Stop Building MCP Integrations From Scratch.

  • Any API, one line of code — connect to ChatGPT, Claude, and Cursor without writing custom MCP servers
  • Visual UI in the chat — render interactive components, not just text dumps. Charts, forms, dashboards.
  • 70% fewer tokens — dynamic tool loading and output compression so your agents stay fast and cheap

Frequently Asked Questions

Is FastMCP production ready?

Yes. FastMCP 3.0+ is production-ready with enterprise OAuth, OpenTelemetry tracing, and a composition model for multi-server setups. Teams are running it in production environments, with developers reporting stable performance since the beta period. The framework processes over 4 million daily downloads and has contributions from 21+ developers. Community feedback confirms reliability: "been on the 3.0 beta the last few weeks myself in prod. Solid."

What is FastMCP used for?

FastMCP is used to build MCP servers that connect AI agents (Claude, ChatGPT, Cursor) to external tools, APIs, and data sources. It handles schema generation, input validation, transport negotiation, and auth so developers can focus on business logic. Common use cases include wrapping REST APIs, building database query tools, creating file management servers, and exposing internal business logic to AI agents.

Does FastMCP support TypeScript?

There is a community-maintained TypeScript version of FastMCP available on npm, but it is not the official Prefect-maintained project. The TypeScript version does not have 1:1 feature parity with the Python version, and its documentation is more limited. Python developers searching for FastMCP TypeScript should note this distinction. For TypeScript-first MCP development, the official Anthropic TypeScript SDK is the more established option.

How do I build a FastMCP server without writing boilerplate?

Install FastMCP with pip install fastmcp, then use the @mcp.tool() decorator on any Python function. FastMCP auto-generates JSON schemas from type hints, pulls descriptions from docstrings, and handles validation. A working server with two tools takes 10 lines of code. For existing APIs, FastMCP.from_openapi() generates tools directly from an OpenAPI spec, though you should filter endpoints to avoid tool explosion. Community developers consistently praise this approach, with one noting "Reading API docs, mapping every parameter, handling auth, writing tests, packaging... repeat" as the pain FastMCP eliminates.

What's the fastest way to reduce token costs in a FastMCP deployment?

Three approaches, ranked by impact: (1) Enable code mode (FastMCP 3.1+), which replaces individual tool schemas with a sandboxed execution environment and can cut token usage from 50,000 to 2,000-3,000 tokens. (2) Consolidate tools around intent rather than mirroring API endpoints 1:1. Developers who collapsed related tools reported 50% fewer errors. (3) Use an MCP gateway like Apigene for dynamic tool loading and output compression, which prevents schema bloat from reaching the agent's context window at all.

Can FastMCP handle OAuth for enterprise identity providers?

Yes. FastMCP 3.0 introduced built-in OAuth support, including an OAuth proxy mode for legacy identity providers. Developers working with outdated IDPs called this feature "a lifesaver." FastMCP supports standard OAuth 2.0 flows, OIDC, and custom token validation through its middleware system. For enterprise setups with Keycloak, Okta, or Azure AD, the OAuth proxy pattern handles the compatibility layer so your server code stays clean.

#mcp#fastmcp#python#mcp-server#tutorial#ai-agents