io.github.cyanheads/protein-mcp-server icon

protein-mcp-server

by Cyanheads

io.github.cyanheads/protein-mcp-server

Federated protein structure & function across experimental (PDB) and predicted (AlphaFold) models.

protein-mcp-server · v0.1.2

by Cyanheads

60

@cyanheads/protein-mcp-server

Federated protein structure & function across experimental (PDB) and predicted (AlphaFold) models via MCP. STDIO or Streamable HTTP.

7 Tools • 2 Resources

Install in Cursor

Public Hosted Server: https://protein.caseyjhand.com/mcp


Tools

Seven tools spanning the structure-research arc — discover, fetch, find homologs, track ligands, compare, profile the corpus, and annotate — over experimental (PDB) and predicted (AlphaFold) structures from one surface:

Tool Description
protein_search_structures Search experimental and predicted structures by free text, sequence, or organism/method/resolution filters, with optional facet breakdowns.
protein_get_structure Fetch metadata and coordinate-file URLs by ID — experimental (PDB), predicted (AlphaFold), or best-available — with batch partial success and optional coordinate inlining.
protein_find_similar Find sequence homologs (RCSB mmseqs2) or fold homologs (Foldseek) from a sequence, PDB ID, or UniProt accession.
protein_track_ligands Resolve ligand names/formulas to component IDs, find structures containing a ligand, or map binding-site residues.
protein_compare_structures Structurally align 2–10 structures (TM-align / jFATCAT) to a reference or as a full pairwise matrix.
protein_analyze_collection Profile the PDB into distributions and trends with server-side facets — counts, histograms, timelines, and cross-tabs.
protein_get_annotations Fetch UniProt features and natural variants plus InterPro domain/family memberships with GO terms.

protein_search_structures

Federated search across experimental (PDB) and predicted (computed-model) structures via RCSB Search v2.

  • Free-text, protein-sequence (triggers an mmseqs2 similarity search), and organism / method / resolution filters
  • content_type scopes the search to experimental, predicted, or all
  • Experimental hits are enriched with title, method, resolution, and organism
  • Optional facets return a method / organism / release-year breakdown alongside the hits at no extra call
  • Chain hit IDs straight into protein_get_structure

protein_get_structure

Fetch structures with metadata and coordinate-file URLs, resolving across providers by source.

  • source: experimental takes PDB entry IDs, batched in one RCSB GraphQL call
  • source: predicted takes UniProt accessions and returns the AlphaFold model with pLDDT/PAE confidence
  • source: best_available takes UniProt accessions and returns the top federated model (experimental if one exists, else the best prediction)
  • Per-ID partial success — unresolved IDs are listed in failed[], not a batch-level error
  • include_coords inlines coordinate content; when a batch overflows the response budget it returns a per-structure size outline, so you can re-call with sections: [ids] for specific structures

protein_find_similar

Find structurally or evolutionarily related proteins, by sequence or by fold.

  • by: sequence runs a synchronous RCSB mmseqs2 search; by: structure runs an asynchronous Foldseek search against experimental and predicted databases
  • Query from a raw one-letter sequence, a PDB ID, or a UniProt accession
  • Foldseek targets default to pdb100 + afdb50; override via databases (e.g. afdb-swissprot, BFVD)
  • Async jobs that exceed the poll budget return status: computing with a ticket — re-call to resume
  • Each hit names the engine and source database it came from

protein_track_ligands

Ligand discovery and binding-site analysis across the PDB.

  • mode: find_ligand resolves a name or formula to chemical component IDs with formula, weight, SMILES, and InChIKey
  • mode: structures_with_ligand returns PDB entries containing a ligand by exact component ID
  • mode: binding_site returns the protein residues lining a ligand's pocket in a structure, with contact distances
  • Binding sites are experimental-only — computed from deposited coordinates (predicted models carry no bound ligands)

protein_compare_structures

Structural alignment of 2–10 structures via the RCSB Structural Comparison service.

  • Methods: tm-align, fatcat-rigid, fatcat-flexible
  • reference: first aligns every structure to the first; reference: all_pairs computes the full pairwise matrix
  • Optional per-structure chain restricts the alignment to a single chain
  • Each pair is an independent async job, fanned out with a concurrency cap and per-pair partial success — a pair still computing when the budget elapses returns status: computing with its job UUID, and a failed pair degrades its row without sinking the others
  • Returns TM-score, RMSD, and aligned-residue count per pair

protein_analyze_collection

Profile the PDB into distributions and trends over an optional scoping query — backed by RCSB's server-side facet engine (one call, compact buckets, no row pull).

  • Group by method, organism, polymer_type, resolution, release_year, or molecular_weight
  • One group_by dimension for a breakdown, or two for a cross-tab (the first nests the second)
  • interval sets the bin width for value histograms or the period for date histograms (year / month / quarter)
  • Scope with a free-text query, organism, method, or max_resolution; content_type selects the structure universe
  • bucket_limit caps buckets per dimension; truncation is flagged in the response

protein_get_annotations

Sequence and functional annotation for a protein.

  • UniProt features (domains, binding sites, PTMs) and natural sequence variants
  • InterPro domain/family memberships (Pfam, PROSITE, …) with associated GO terms
  • Provide a UniProt accession directly, or a PDB ID — resolved to its accession via the structure's sequence cross-reference
  • include scopes which annotation classes are fetched: features, domains, variants, or all

Resources

Type Name Description
Resource pdb://{entry_id} Experimental structure summary for a PDB entry — title, method, resolution, organism, chains, and bound ligands.
Resource af://{uniprot} Predicted-structure summary for a UniProt accession from AlphaFold DB — mean pLDDT, confidence-band fractions, model URLs, and version.

All resource data is also reachable via tools — pdb://{entry_id} mirrors protein_get_structure for source: experimental, and af://{uniprot} mirrors it for source: predicted. Many MCP clients are tool-only and don't surface resources; the summaries remain reachable through the tools.

Features

Built on @cyanheads/mcp-ts-core:

  • Declarative tool and resource definitions — single file per primitive, framework handles registration and validation
  • Unified error handling — handlers throw, framework catches, classifies, and formats
  • Pluggable auth: none, jwt, oauth
  • Swappable storage backends: in-memory, filesystem, Supabase, Cloudflare KV/R2/D1
  • Structured logging with optional OpenTelemetry tracing
  • STDIO and Streamable HTTP transports

Protein-specific:

  • One federated surface over experimental (PDB) and predicted (AlphaFold / 3D-Beacons) structures — search, fetch, and compare treat both universes the same
  • Keyless across every upstream — RCSB, AlphaFold DB, 3D-Beacons, UniProt, InterPro, and Foldseek, no API keys to provision
  • Corpus analytics run server-side on RCSB's facet engine — distributions, histograms, and cross-tabs in one call, no row pull and no SQL workspace
  • Async alignment and Foldseek jobs poll within a bounded budget and hand back a resumable ticket instead of blocking

Agent-friendly output:

  • Provenance on every response — each hit carries a source (experimental / predicted), the engine and database that produced it, and effective-query / total-count echoes so agents can reason about coverage
  • Graceful partial failure — batch fetches and pairwise comparisons return per-item rows (failed[], per-pair status) instead of failing the whole request, each with actionable recovery text
  • Discriminated output contracts — typed source and status unions, computing results with resume tickets, and budget-overflow outlines let callers branch on data, not string parsing

Getting started

Public Hosted Instance

A public instance is available at https://protein.caseyjhand.com/mcp — no installation required. Point any MCP client at it via Streamable HTTP:

{
  "mcpServers": {
    "protein": {
      "type": "streamable-http",
      "url": "https://protein.caseyjhand.com/mcp"
    }
  }
}

Self-hosted

Add the following to your MCP client configuration file. No API key is required — every upstream provider is keyless.

{
  "mcpServers": {
    "protein-mcp-server": {
      "type": "stdio",
      "command": "bunx",
      "args": ["@cyanheads/protein-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with npx (no Bun required):

{
  "mcpServers": {
    "protein-mcp-server": {
      "type": "stdio",
      "command": "npx",
      "args": ["-y", "@cyanheads/protein-mcp-server@latest"],
      "env": {
        "MCP_TRANSPORT_TYPE": "stdio",
        "MCP_LOG_LEVEL": "info"
      }
    }
  }
}

Or with Docker:

{
  "mcpServers": {
    "protein-mcp-server": {
      "type": "stdio",
      "command": "docker",
      "args": ["run", "-i", "--rm", "-e", "MCP_TRANSPORT_TYPE=stdio", "ghcr.io/cyanheads/protein-mcp-server:latest"]
    }
  }
}

For Streamable HTTP, set the transport and start the server:

MCP_TRANSPORT_TYPE=http MCP_HTTP_PORT=3010 bun run start:http
# Server listens at http://localhost:3010/mcp

Prerequisites

  • Bun v1.3.2 or higher (or Node.js v24+).
  • No accounts or API keys — RCSB, AlphaFold DB, 3D-Beacons, UniProt, InterPro, and Foldseek are all public and keyless.

Installation

  1. Clone the repository:
git clone https://github.com/cyanheads/protein-mcp-server.git
  1. Navigate into the directory:
cd protein-mcp-server
  1. Install dependencies:
bun install

Configuration

All upstream providers are keyless, so the server runs out of the box with no configuration. Every variable below is optional.

Variable Description Default
PROTEIN_ASYNC_POLL_TIMEOUT_MS Max wall-clock to poll an async job (alignment / Foldseek) before returning a computing result. 30000
PROTEIN_MAX_BATCH_IDS Cap on IDs accepted by protein_get_structure in one batch (1–100). 25
PROTEIN_MAX_COMPARE_STRUCTURES Cap on structures per protein_compare_structures call (2–25). 10
PROTEIN_FACET_BUCKET_CAP Default cap on buckets per protein_analyze_collection dimension (1–500). 50
PROTEIN_FANOUT_CONCURRENCY Max concurrent upstream requests for per-ID / per-pair fan-out (1–16). 5
RCSB_SEARCH_BASE_URL Base URL for the RCSB Search API v2. https://search.rcsb.org
ALPHAFOLD_BASE_URL Base URL for the AlphaFold Protein Structure Database API. https://alphafold.ebi.ac.uk
FOLDSEEK_BASE_URL Base URL for the Foldseek structural-similarity search service. https://search.foldseek.com
MCP_TRANSPORT_TYPE Transport: stdio or http. stdio
MCP_HTTP_PORT Port for the HTTP server. 3010
MCP_AUTH_MODE Auth mode: none, jwt, or oauth. none
MCP_LOG_LEVEL Log level (RFC 5424). info
OTEL_ENABLED Enable OpenTelemetry instrumentation. false

See .env.example for the full list of provider base-URL overrides and tuning limits.

Running the server

Local development

  • Build and run:

    # One-time build
    bun run rebuild
    
    # Run the built server
    bun run start:stdio
    # or
    bun run start:http
    
  • Run checks and tests:

    bun run devcheck   # Lint, format, typecheck, security
    bun run test       # Vitest test suite
    bun run lint:mcp   # Validate MCP definitions against spec
    

Docker

docker build -t protein-mcp-server .
docker run --rm -e MCP_TRANSPORT_TYPE=http -p 3010:3010 protein-mcp-server

The Dockerfile defaults to HTTP transport, stateless session mode, and logs to /var/log/protein-mcp-server. OpenTelemetry peer dependencies are installed by default — build with --build-arg OTEL_ENABLED=false to omit them.

Project structure

Directory Purpose
src/index.ts createApp() entry point — registers tools/resources and inits the provider services.
src/config Server-specific environment variable parsing and validation with Zod.
src/mcp-server/tools Tool definitions (*.tool.ts).
src/mcp-server/resources Resource definitions (*.resource.ts).
src/services Provider service layer — RCSB, AlphaFold, 3D-Beacons, UniProt, InterPro, Foldseek, and shared HTTP/identifier helpers.
tests/ Unit and integration tests mirroring src/.

Development guide

See CLAUDE.md/AGENTS.md for development guidelines and architectural rules. The short version:

  • Handlers throw, framework catches — no try/catch in tool logic
  • Use ctx.log for request-scoped logging, ctx.state for tenant-scoped storage
  • Register new tools and resources via the barrels in src/mcp-server/*/definitions/index.ts
  • Wrap external API calls: validate raw → normalize to domain type → return output schema; never fabricate missing fields

Contributing

Issues and pull requests are welcome. Run checks and tests before submitting:

bun run devcheck
bun run test

License

Apache-2.0 — see LICENSE for details.