Mnemo MCP Server

Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.

Features

Hybrid search: FTS5 full-text + sqlite-vec semantic + Qwen3-Embedding-0.6B (built-in)
Zero config mode: Works out of the box — local embedding, no API keys needed
Auto-detect embedding: Set API_KEYS for cloud embedding, auto-fallback to local
Embedded sync: rclone auto-downloaded and managed as subprocess
Multi-machine: JSONL-based merge sync via rclone (Google Drive, S3, etc.)
Proactive memory: Tool descriptions guide AI to save preferences, decisions, facts

Quick Start

The recommended way to run this server is via uvx:

uvx mnemo-mcp@latest

Alternatively, you can use pipx run mnemo-mcp.

Option 1: uvx (Recommended)

{
  "mcpServers": {
    "mnemo": {
      "command": "uvx",
      "args": ["mnemo-mcp@latest"],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
        // -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
        // -- first run downloads ~570MB model, cached for subsequent runs
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        // -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
        // "EMBEDDING_API_BASE": "https://your-worker.modal.run",
        // "EMBEDDING_API_KEY": "your-key",
        // -- optional: sync memories across machines via rclone
        // -- on first sync, a browser opens for OAuth (auto, no manual setup)
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_INTERVAL": "300"                     // optional, auto-sync every 5min (0 = manual only)
        // "SYNC_REMOTE": "gdrive",                 // optional, default: gdrive
        // "SYNC_PROVIDER": "drive",                // optional, default: drive (Google Drive)
      }
    }
  }
}

Option 2: Docker

{
  "mcpServers": {
    "mnemo": {
      "command": "docker",
      "args": [
        "run", "-i", "--rm",
        "--name", "mcp-mnemo",
        "-v", "mnemo-data:/data",                  // persists memories across restarts
        "-e", "LITELLM_PROXY_URL",                 // optional: pass-through from env below
        "-e", "LITELLM_PROXY_KEY",                 // optional: pass-through from env below
        "-e", "API_KEYS",                          // optional: pass-through from env below
        "-e", "EMBEDDING_API_BASE",                // optional: pass-through from env below
        "-e", "EMBEDDING_API_KEY",                 // optional: pass-through from env below
        "-e", "SYNC_ENABLED",                      // optional: pass-through from env below
        "-e", "SYNC_INTERVAL",                     // optional: pass-through from env below
        "n24q02m/mnemo-mcp:latest"
      ],
      "env": {
        // -- optional: LiteLLM Proxy (production, selfhosted gateway)
        // "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
        // "LITELLM_PROXY_KEY": "sk-your-virtual-key",
        // -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
        // -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
        "API_KEYS": "GOOGLE_API_KEY:AIza...",
        // -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
        // "EMBEDDING_API_BASE": "https://your-worker.modal.run",
        // "EMBEDDING_API_KEY": "your-key",
        // -- optional: sync memories across machines via rclone
        "SYNC_ENABLED": "true",                    // optional, default: false
        "SYNC_INTERVAL": "300"                     // optional, auto-sync every 5min (0 = manual only)
      }
    }
  }
}

Pre-install (optional)

Pre-download dependencies before adding to your MCP client config. This avoids slow first-run startup:

# Pre-download embedding model (~570MB) and validate API keys
uvx mnemo-mcp warmup

# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_KEY:AIza..." uvx mnemo-mcp warmup

Sync setup

Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:

First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
Token saved: OAuth token is stored locally at ~/.mnemo-mcp/tokens/ (600 permissions)
Subsequent runs: Token is loaded automatically — no manual steps needed

For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:

{
  "SYNC_ENABLED": "true",
  "SYNC_PROVIDER": "dropbox",        // rclone provider type
  "SYNC_REMOTE": "dropbox"           // rclone remote name
}

Advanced: You can also run uvx mnemo-mcp setup-sync drive to pre-authenticate before first use, but this is optional.

Configuration

Variable	Default	Description
`DB_PATH`	`~/.mnemo-mcp/memories.db`	Database location
`LITELLM_PROXY_URL`	—	LiteLLM Proxy URL (e.g. `http://10.0.0.20:4000`). Enables proxy mode
`LITELLM_PROXY_KEY`	—	LiteLLM Proxy virtual key (e.g. `sk-...`)
`API_KEYS`	—	API keys (`ENV:key,ENV:key`). Optional: enables semantic search (SDK mode)
`EMBEDDING_API_BASE`	—	Custom embedding endpoint URL (optional, for SDK mode)
`EMBEDDING_API_KEY`	—	Custom embedding endpoint key (optional)
`EMBEDDING_BACKEND`	(auto-detect)	`litellm` (cloud API) or `local` (Qwen3). Auto: API_KEYS -> litellm, else local (always available)
`EMBEDDING_MODEL`	auto-detect	LiteLLM model name (optional)
`EMBEDDING_DIMS`	`0` (auto=768)	Embedding dimensions (0 = auto-detect, default 768)
`SYNC_ENABLED`	`false`	Enable rclone sync
`SYNC_PROVIDER`	`drive`	rclone provider type (drive, dropbox, s3, etc.)
`SYNC_REMOTE`	`gdrive`	rclone remote name
`SYNC_FOLDER`	`mnemo-mcp`	Remote folder
`SYNC_INTERVAL`	`300`	Auto-sync seconds (0=manual)
`LOG_LEVEL`	`INFO`	Log level

Embedding (3-Mode Architecture)

Embedding is always available — a local model is built-in and requires no configuration.

Embedding access supports 3 modes, resolved by priority:

Priority	Mode	Config	Use case
1	Proxy	`LITELLM_PROXY_URL` + `LITELLM_PROXY_KEY`	Production (OCI VM, selfhosted gateway)
2	SDK	`API_KEYS` or `EMBEDDING_API_BASE`	Dev/local with direct API access
3	Local	Nothing needed	Offline, always available as fallback

No cross-mode fallback — if proxy is configured but unreachable, calls fail (no silent fallback to direct API).

Local mode: Qwen3-Embedding-0.6B, always available with zero config.
GPU auto-detection: If GPU is available (CUDA/DirectML) and llama-cpp-python is installed, automatically uses GGUF model (~480MB) instead of ONNX (~570MB) for better performance.
All embeddings stored at 768 dims (default). Switching providers never breaks the vector table.
Override with EMBEDDING_BACKEND=local to force local even with API keys.

API_KEYS supports multiple providers in a single string:

API_KEYS=GOOGLE_API_KEY:AIza...,OPENAI_API_KEY:sk-...,COHERE_API_KEY:co-...

Cloud embedding providers (auto-detected from API_KEYS, priority order):

Priority	Env Var (LiteLLM)	Model	Native Dims	Stored
1	`GEMINI_API_KEY`	`gemini/gemini-embedding-001`	3072	768
2	`OPENAI_API_KEY`	`text-embedding-3-large`	3072	768
3	`COHERE_API_KEY`	`embed-multilingual-v3.0`	1024	768

All embeddings are truncated to 768 dims (default) for storage. This ensures switching models never breaks the vector table. Override with EMBEDDING_DIMS if needed.

API_KEYS format maps your env var to LiteLLM's expected var (e.g., GOOGLE_API_KEY:key auto-sets GEMINI_API_KEY). Set EMBEDDING_MODEL explicitly for other providers.

MCP Tools

`memory` — Core memory operations

Action	Required	Optional
`add`	`content`	`category`, `tags`
`search`	`query`	`category`, `tags`, `limit`
`list`	—	`category`, `limit`
`update`	`memory_id`	`content`, `category`, `tags`
`delete`	`memory_id`	—
`export`	—	—
`import`	`data` (JSONL)	`mode` (merge/replace)
`stats`	—	—

`config` — Server configuration

Action	Required	Optional
`status`	—	—
`sync`	—	—
`set`	`key`, `value`	—

`help` — Full documentation

help(topic="memory")  # or "config"

MCP Resources

URI	Description
`mnemo://stats`	Database statistics and server status
`mnemo://recent`	10 most recently updated memories

MCP Prompts

Prompt	Parameters	Description
`save_summary`	`summary`	Generate prompt to save a conversation summary as memory
`recall_context`	`topic`	Generate prompt to recall relevant memories about a topic

Architecture

                  MCP Client (Claude, Cursor, etc.)
                         |
                    FastMCP Server
                   /      |       \
             memory    config    help
                |         |        |
            MemoryDB   Settings  docs/
            /     \
        FTS5    sqlite-vec
                    |
              EmbeddingBackend
              /            \
         LiteLLM        Qwen3 ONNX
            |           (local CPU)
  Gemini / OpenAI / Cohere

        Sync: rclone (embedded) -> Google Drive / S3 / ...

Development

# Install
uv sync

# Run
uv run mnemo-mcp

# Lint
uv run ruff check src/
uv run ty check src/

# Test
uv run pytest

Compatible With

Also by n24q02m

Server	Description	Install
better-notion-mcp	Notion API for AI agents	`npx -y @n24q02m/better-notion-mcp@latest`
wet-mcp	Web search, content extraction, library docs	`uvx --python 3.13 wet-mcp@latest`
better-email-mcp	Email (IMAP/SMTP) for AI agents	`npx -y @n24q02m/better-email-mcp@latest`
better-godot-mcp	Godot Engine for AI agents	`npx -y @n24q02m/better-godot-mcp@latest`

Related Projects

modalcom-ai-workers — GPU-accelerated AI workers on Modal.com (embedding, reranking)
qwen3-embed — Local embedding/reranking library used by mnemo-mcp

Contributing

See CONTRIBUTING.md

License

MIT - See LICENSE

mnemo-mcp