How to build a web scraping MCP server
Quick answer
Build a web scraping
MCP server by implementing the mcp.server interface in Python, exposing web scraping functions as MCP tools. Use mcp.server.stdio.stdio_server to run the server, enabling AI agents to invoke scraping tasks via the MCP protocol.PREREQUISITES
Python 3.8+pip install mcp requests beautifulsoup4Basic knowledge of web scraping and Python async programming
Setup
Install the required Python packages for MCP and web scraping:
mcpfor the Model Context Protocol serverrequestsfor HTTP requestsbeautifulsoup4for HTML parsing
Set up your environment with Python 3.8 or higher.
pip install mcp requests beautifulsoup4 Step by step
Create a Python MCP server that exposes a web scraping tool. The tool fetches a URL and extracts the page title. Use mcp.server.stdio.stdio_server to run the server over standard IO, which is the recommended transport for MCP.
import requests
from bs4 import BeautifulSoup
from mcp.server import Server
from mcp.server.stdio import stdio_server
class WebScraper:
async def scrape_title(self, url: str) -> str:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
title = soup.title.string if soup.title else 'No title found'
return title
async def main():
scraper = WebScraper()
server = Server(tools={'scrape_title': scraper.scrape_title})
await stdio_server(server)
if __name__ == '__main__':
import asyncio
asyncio.run(main()) Common variations
You can extend the MCP server to scrape other elements by adding more async methods to the WebScraper class. For example, scrape meta descriptions or links.
Use different MCP transports like SSE or TCP if needed, but stdio_server is simplest for local or containerized deployments.
Integrate with AI agents by connecting the MCP server's stdio to the agent's MCP client.
async def scrape_meta_description(self, url: str) -> str:
response = requests.get(url)
response.raise_for_status()
soup = BeautifulSoup(response.text, 'html.parser')
meta = soup.find('meta', attrs={'name': 'description'})
return meta['content'] if meta and 'content' in meta.attrs else 'No meta description found' Troubleshooting
- If the server does not start, ensure Python 3.8+ is used and all dependencies are installed.
- If scraping fails, check network connectivity and that the target website allows scraping.
- Handle exceptions in scraping methods to avoid server crashes.
- Use logging inside the MCP server to debug requests and responses.
Key Takeaways
- Use the
mcpPython package to build MCP servers exposing web scraping tools. - Run the MCP server with
stdio_serverfor simple, robust communication. - Implement async scraping methods to handle web requests and parsing efficiently.
- Extend the server with multiple scraping functions to support diverse AI agent queries.
- Handle errors gracefully and log activity for easier troubleshooting.