Federated protein structure & function across experimental (PDB) and predicted (AlphaFold) models.
Federated protein structure & function across experimental (PDB) and predicted (AlphaFold) models.
protein-mcp-server · v0.1.2
by Cyanheads
@cyanheads/protein-mcp-server
Federated protein structure & function across experimental (PDB) and predicted (AlphaFold) models via MCP. STDIO or Streamable HTTP.
Public Hosted Server: https://protein.caseyjhand.com/mcp
Tools
Seven tools spanning the structure-research arc — discover, fetch, find homologs, track ligands, compare, profile the corpus, and annotate — over experimental (PDB) and predicted (AlphaFold) structures from one surface:
| Tool | Description |
|---|---|
protein_search_structures |
Search experimental and predicted structures by free text, sequence, or organism/method/resolution filters, with optional facet breakdowns. |
protein_get_structure |
Fetch metadata and coordinate-file URLs by ID — experimental (PDB), predicted (AlphaFold), or best-available — with batch partial success and optional coordinate inlining. |
protein_find_similar |
Find sequence homologs (RCSB mmseqs2) or fold homologs (Foldseek) from a sequence, PDB ID, or UniProt accession. |
protein_track_ligands |
Resolve ligand names/formulas to component IDs, find structures containing a ligand, or map binding-site residues. |
protein_compare_structures |
Structurally align 2–10 structures (TM-align / jFATCAT) to a reference or as a full pairwise matrix. |
protein_analyze_collection |
Profile the PDB into distributions and trends with server-side facets — counts, histograms, timelines, and cross-tabs. |
protein_get_annotations |
Fetch UniProt features and natural variants plus InterPro domain/family memberships with GO terms. |
protein_search_structures
Federated search across experimental (PDB) and predicted (computed-model) structures via RCSB Search v2.
- Free-text, protein-sequence (triggers an mmseqs2 similarity search), and organism / method / resolution filters
content_typescopes the search toexperimental,predicted, orall- Experimental hits are enriched with title, method, resolution, and organism
- Optional
facetsreturn a method / organism / release-year breakdown alongside the hits at no extra call - Chain hit IDs straight into
protein_get_structure
protein_get_structure
Fetch structures with metadata and coordinate-file URLs, resolving across providers by source.
source: experimentaltakes PDB entry IDs, batched in one RCSB GraphQL callsource: predictedtakes UniProt accessions and returns the AlphaFold model with pLDDT/PAE confidencesource: best_availabletakes UniProt accessions and returns the top federated model (experimental if one exists, else the best prediction)- Per-ID partial success — unresolved IDs are listed in
failed[], not a batch-level error include_coordsinlines coordinate content; when a batch overflows the response budget it returns a per-structure size outline, so you can re-call withsections: [ids]for specific structures
protein_find_similar
Find structurally or evolutionarily related proteins, by sequence or by fold.
by: sequenceruns a synchronous RCSB mmseqs2 search;by: structureruns an asynchronous Foldseek search against experimental and predicted databases- Query from a raw one-letter sequence, a PDB ID, or a UniProt accession
- Foldseek targets default to
pdb100+afdb50; override viadatabases(e.g.afdb-swissprot,BFVD) - Async jobs that exceed the poll budget return
status: computingwith a ticket — re-call to resume - Each hit names the engine and source database it came from
protein_track_ligands
Ligand discovery and binding-site analysis across the PDB.
mode: find_ligandresolves a name or formula to chemical component IDs with formula, weight, SMILES, and InChIKeymode: structures_with_ligandreturns PDB entries containing a ligand by exact component IDmode: binding_sitereturns the protein residues lining a ligand's pocket in a structure, with contact distances- Binding sites are experimental-only — computed from deposited coordinates (predicted models carry no bound ligands)
protein_compare_structures
Structural alignment of 2–10 structures via the RCSB Structural Comparison service.
- Methods:
tm-align,fatcat-rigid,fatcat-flexible reference: firstaligns every structure to the first;reference: all_pairscomputes the full pairwise matrix- Optional per-structure
chainrestricts the alignment to a single chain - Each pair is an independent async job, fanned out with a concurrency cap and per-pair partial success — a pair still computing when the budget elapses returns
status: computingwith its job UUID, and a failed pair degrades its row without sinking the others - Returns TM-score, RMSD, and aligned-residue count per pair
protein_analyze_collection
Profile the PDB into distributions and trends over an optional scoping query — backed by RCSB's server-side facet engine (one call, compact buckets, no row pull).
- Group by
method,organism,polymer_type,resolution,release_year, ormolecular_weight - One
group_bydimension for a breakdown, or two for a cross-tab (the first nests the second) intervalsets the bin width for value histograms or the period for date histograms (year/month/quarter)- Scope with a free-text
query,organism,method, ormax_resolution;content_typeselects the structure universe bucket_limitcaps buckets per dimension; truncation is flagged in the response
protein_get_annotations
Sequence and functional annotation for a protein.
- UniProt features (domains, binding sites, PTMs) and natural sequence variants
- InterPro domain/family memberships (Pfam, PROSITE, …) with associated GO terms
- Provide a UniProt accession directly, or a PDB ID — resolved to its accession via the structure's sequence cross-reference
includescopes which annotation classes are fetched:features,domains,variants, orall
Resources
| Type | Name | Description |
|---|---|---|
| Resource | pdb://{entry_id} |
Experimental structure summary for a PDB entry — title, method, resolution, organism, chains, and bound ligands. |
| Resource | af://{uniprot} |
Predicted-structure summary for a UniProt accession from AlphaFold DB — mean pLDDT, confidence-band fractions, model URLs, and version. |
All resource data is also reachable via tools — pdb://{entry_id} mirrors protein_get_structure for source: experimental, and af://{uniprot} mirrors it for source: predicted. Many MCP clients are tool-only and don't surface resources; the summaries remain reachable through the tools.
Features
Built on @cyanheads/mcp-ts-core:
- Declarative tool and resource definitions — single file per primitive, framework handles registration and validation
- Unified error handling — handlers throw, framework catches, classifies, and formats
- Pluggable auth:
none,jwt,oauth - Swappable storage backends:
in-memory,filesystem,Supabase,Cloudflare KV/R2/D1 - Structured logging with optional OpenTelemetry tracing
- STDIO and Streamable HTTP transports
Protein-specific:
- One federated surface over experimental (PDB) and predicted (AlphaFold / 3D-Beacons) structures — search, fetch, and compare treat both universes the same
- Keyless across every upstream — RCSB, AlphaFold DB, 3D-Beacons, UniProt, InterPro, and Foldseek, no API keys to provision
- Corpus analytics run server-side on RCSB's facet engine — distributions, histograms, and cross-tabs in one call, no row pull and no SQL workspace
- Async alignment and Foldseek jobs poll within a bounded budget and hand back a resumable ticket instead of blocking
Agent-friendly output:
- Provenance on every response — each hit carries a
source(experimental/predicted), the engine and database that produced it, and effective-query / total-count echoes so agents can reason about coverage - Graceful partial failure — batch fetches and pairwise comparisons return per-item rows (
failed[], per-pairstatus) instead of failing the whole request, each with actionable recovery text - Discriminated output contracts — typed
sourceandstatusunions,computingresults with resume tickets, and budget-overflow outlines let callers branch on data, not string parsing
Getting started
Public Hosted Instance
A public instance is available at https://protein.caseyjhand.com/mcp — no installation required. Point any MCP client at it via Streamable HTTP:
{
"mcpServers": {
"protein": {
"type": "streamable-http",
"url": "https://protein.caseyjhand.com/mcp"
}
}
}
Self-hosted
Add the following to your MCP client configuration file. No API key is required — every upstream provider is keyless.
{
"mcpServers": {
"protein-mcp-server": {
"type": "stdio",
"command": "bunx",
"args": ["@cyanheads/protein-mcp-server@latest"],
"env": {
"MCP_TRANSPORT_TYPE": "stdio",
"MCP_LOG_LEVEL": "info"
}
}
}
}
Or with npx (no Bun required):
{
"mcpServers": {
"protein-mcp-server": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@cyanheads/protein-mcp-server@latest"],
"env": {
"MCP_TRANSPORT_TYPE": "stdio",
"MCP_LOG_LEVEL": "info"
}
}
}
}
Or with Docker:
{
"mcpServers": {
"protein-mcp-server": {
"type": "stdio",
"command": "docker",
"args": ["run", "-i", "--rm", "-e", "MCP_TRANSPORT_TYPE=stdio", "ghcr.io/cyanheads/protein-mcp-server:latest"]
}
}
}
For Streamable HTTP, set the transport and start the server:
MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp
Prerequisites
- Bun v1.3.2 or higher (or Node.js v24+).
- No accounts or API keys — RCSB, AlphaFold DB, 3D-Beacons, UniProt, InterPro, and Foldseek are all public and keyless.
Installation
- Clone the repository:
git clone https://github.com/cyanheads/protein-mcp-server.git
- Navigate into the directory:
cd protein-mcp-server
- Install dependencies:
bun install
Configuration
All upstream providers are keyless, so the server runs out of the box with no configuration. Every variable below is optional.
| Variable | Description | Default |
|---|---|---|
PROTEIN_ASYNC_POLL_TIMEOUT_MS |
Max wall-clock to poll an async job (alignment / Foldseek) before returning a computing result. |
30000 |
PROTEIN_MAX_BATCH_IDS |
Cap on IDs accepted by protein_get_structure in one batch (1–100). |
25 |
PROTEIN_MAX_COMPARE_STRUCTURES |
Cap on structures per protein_compare_structures call (2–25). |
10 |
PROTEIN_FACET_BUCKET_CAP |
Default cap on buckets per protein_analyze_collection dimension (1–500). |
50 |
PROTEIN_FANOUT_CONCURRENCY |
Max concurrent upstream requests for per-ID / per-pair fan-out (1–16). | 5 |
RCSB_SEARCH_BASE_URL |
Base URL for the RCSB Search API v2. | https://search.rcsb.org |
ALPHAFOLD_BASE_URL |
Base URL for the AlphaFold Protein Structure Database API. | https://alphafold.ebi.ac.uk |
FOLDSEEK_BASE_URL |
Base URL for the Foldseek structural-similarity search service. | https://search.foldseek.com |
MCP_TRANSPORT_TYPE |
Transport: stdio or http. |
stdio |
MCP_HTTP_PORT |
Port for the HTTP server. | 3010 |
MCP_AUTH_MODE |
Auth mode: none, jwt, or oauth. |
none |
MCP_LOG_LEVEL |
Log level (RFC 5424). | info |
OTEL_ENABLED |
Enable OpenTelemetry instrumentation. | false |
See .env.example for the full list of provider base-URL overrides and tuning limits.
Running the server
Local development
-
Build and run:
# One-time build bun run rebuild # Run the built server bun run start:stdio # or bun run start:http -
Run checks and tests:
bun run devcheck # Lint, format, typecheck, security bun run test # Vitest test suite bun run lint:mcp # Validate MCP definitions against spec
Docker
docker build -t protein-mcp-server .
docker run --rm -e MCP_TRANSPORT_TYPE=http -p 3010:3010 protein-mcp-server
The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/protein-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.
Project structure
| Directory | Purpose |
|---|---|
src/index.ts |
createApp() entry point — registers tools/resources and inits the provider services. |
src/config |
Server-specific environment variable parsing and validation with Zod. |
src/mcp-server/tools |
Tool definitions (*.tool.ts). |
src/mcp-server/resources |
Resource definitions (*.resource.ts). |
src/services |
Provider service layer — RCSB, AlphaFold, 3D-Beacons, UniProt, InterPro, Foldseek, and shared HTTP/identifier helpers. |
tests/ |
Unit and integration tests mirroring src/. |
Development guide
See CLAUDE.md/AGENTS.md for development guidelines and architectural rules. The short version:
- Handlers throw, framework catches — no
try/catchin tool logic - Use
ctx.logfor request-scoped logging,ctx.statefor tenant-scoped storage - Register new tools and resources via the barrels in
src/mcp-server/*/definitions/index.ts - Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields
Contributing
Issues and pull requests are welcome. Run checks and tests before submitting:
bun run devcheck
bun run test
License
Apache-2.0 — see LICENSE for details.