Model Context Protocol: a server you can run, deploy, and try
MCP is an open standard for connecting AI models to tools and data — think of it as a typed, language-agnostic contract any AI client can consume without writing custom integration code. I built a small, production-shaped MCP server to show the whole path: the protocol, an optional interactive UI layer, and running it on AWS. You can drive it right here in the browser.
What is MCP?
Before MCP, every assistant wired up every tool with its own bespoke glue — an N×M mess nobody owned. MCP replaces that with one protocol: a client (the AI host — Claude, Cursor, VS Code, Copilot) talks to servers that expose tools, resources, and prompts over JSON-RPC. People call it "USB-C for AI tools": build your server once, and any MCP-capable client can use it — no per-assistant rework.
Why it matters
Build once, reuse everywhere
One server is consumed by every MCP-capable client. You don't re-integrate each time your teams adopt a new AI tool.
A single control point
Auth, rate limits, allow-lists, and audit live in one server you operate — not scattered across prompts and clients. That's where access policy and logging belong.
Vendor-neutral
The protocol is open and model-agnostic. Swap the host or the model without rewriting the integration — you're not locked into one vendor's plugin format.
You own the data path
The server runs in your account, next to your systems. Sensitive data and credentials stay on your side of the boundary.
Two layers: a protocol, plus an optional UI
Worth separating in your head — they're decoupled on purpose.
Tools, resources, and prompts the model calls over JSON-RPC; the host renders the result. This is the 90% case, and it's what makes a server reusable across clients.
An optional extension (SEP-1865): a tool can also ship an HTML view the host renders in a sandboxed iframe, talking back over postMessage. Use it only where a person benefits from a visual surface. This demo ships one panel per tool so you can click each — in production, UI is selective.
How a single call flows
End to end, a tool call is five hops — no magic, just JSON-RPC over HTTP.
- 1 · HostThe AI app (Claude, Cursor) decides a tool is needed and assembles the arguments.
- 2 · ClientIts MCP client sends a JSON-RPC tools/call request.
- 3 · TransportOver Streamable HTTP to POST /mcp — through a load balancer to any one replica.
- 4 · ToolThe server validates the arguments against the tool's schema and runs it.
- 5 · ResultStructured output streams back to the host — and if the tool has an MCP App, its UI renders.
Architecture
The server speaks the Streamable HTTP transport on a single endpoint (POST /mcp). A host sends JSON-RPC; behind a load balancer, any one of N identical replicas handles the request.
Why stateless
Each request builds a fresh server and transport — no session held in memory between calls. That's the design decision that lets you run N interchangeable replicas behind a load balancer (EKS/ECS) or scale to zero (Lambda) with no sticky sessions. The same server code runs unchanged across all three targets.
How this demo runs live
The "Try it live" panel below isn't faked — this is the real path a tool call takes on karina-borges.com.
- 1 · BrowserYou click Run on karina-borges.com — a Next.js app served from Vercel.
- 2 · Vercel proxyThe page calls a same-origin route, /api/mcp-demo — never the MCP server directly. It rate-limits, enforces a read-only allow-list, and adds the bearer token.
- 3 · To AWSThe proxy forwards the JSON-RPC request over HTTPS to the Lambda Function URL (MCP_DEMO_URL), with the bearer token attached.
- 4 · LambdaLambda starts (or reuses) an instance, builds a fresh server + transport, validates the arguments, and runs the deterministic tool — then scales back to zero when idle.
- 5 · Back to youThe structured result streams back through the proxy to the panel — and, for MCP App panels, into the sandboxed iframe.
The proxy is the security boundary. The browser is untrusted, so rate-limiting, the read-only allow-list, and the bearer token all live in one same-origin route — the public can drive the demo without ever reaching the MCP server directly.
Why Lambda? Cost.
The live demo runs on Lambda for scale-to-zero — effectively $0 when idle, with HTTPS built in and no load balancer or always-on nodes. EKS and ECS are documented below as production options: the same image and code, with different cost and control trade-offs.
Where it runs: EKS vs ECS vs Lambda
Full control — HPA, rolling updates, mTLS via a mesh, private subnets. The right fit when you already run Kubernetes and want the MCP server to be "just another Deployment." You carry the cluster's operational weight and pay for nodes at idle.
≈ $120–180/mo — always-on control plane, nodes, ALB, NAT
The same container, far less to operate: define a task and a service, AWS schedules it. Long-lived SSE works fine. Usually the pragmatic choice when you don't already run Kubernetes.
≈ $30–40/mo — one always-on task + ALB
Serverless, scale to zero — pay nothing when idle. Statelessness makes it a natural fit; the catch is constrained streaming and a 15-minute ceiling. Great for spiky, low-volume traffic.
≈ $0/mo idle — scale-to-zero, pay per request
Observability
The server ships structured JSON logs and a Prometheus /metrics endpoint. Below it is, live — the same signals you'd scrape into Prometheus, CloudWatch, or Grafana in production.
Live server metrics
GET /metricsPulled live from the server's Prometheus /metrics endpoint, through the same-origin proxy. Run a tool above, then refresh — the counters move.
Shows real MCP traffic — the widget's own /metrics polls and /healthz probes are excluded, so refreshing doesn't move the count (only running a tool or loading a panel does). Counters are per-replica and in-memory; on Lambda (scale-to-zero) they reflect a single warm instance and reset on cold start — in production you'd aggregate via Prometheus or CloudWatch/EMF.
Try it live
Pick a tool, edit the JSON arguments, and run it against the MCP server. The request goes through a same-origin, rate-limited proxy that only allows these read-only, deterministic tools — no secrets, no write paths. Below that, load any tool's MCP App panel and interact with it to see the postMessage round-trip.
Calls go through a same-origin proxy to the MCP server. If the demo server isn't running, you'll see a friendly error — the protocol and UI still illustrate the flow.
MCP App UI panel
ui://devtools/pr-panelPick a tool's panel below. Each is the server's ui:// resource rendered in a sandboxed iframe; edit its fields and submit — the panel posts a tools/call back out, the proxy runs it, and the result is pushed back in. A full postMessage round-trip.
Live server metrics
GET /metricsPulled live from the server's Prometheus /metrics endpoint, through the same-origin proxy. Run a tool above, then refresh — the counters move.
Shows real MCP traffic — the widget's own /metrics polls and /healthz probes are excluded, so refreshing doesn't move the count (only running a tool or loading a panel does). Counters are per-replica and in-memory; on Lambda (scale-to-zero) they reflect a single warm instance and reset on cold start — in production you'd aggregate via Prometheus or CloudWatch/EMF.