Persistent AI memory with hybrid search (FTS5 + semantic) and cross-machine sync.
Persistent AI memory with hybrid search (FTS5 + semantic) and cross-machine sync.
mnemo-mcp · v1.7.0
n24q02m
Mnemo MCP Server
Persistent AI memory with hybrid search and embedded sync. Open, free, unlimited.
Features
- Hybrid search: FTS5 full-text + sqlite-vec semantic + Qwen3-Embedding-0.6B (built-in)
- Zero config mode: Works out of the box — local embedding, no API keys needed
- Auto-detect embedding: Set
API_KEYSfor cloud embedding, auto-fallback to local - Embedded sync: rclone auto-downloaded and managed as subprocess
- Multi-machine: JSONL-based merge sync via rclone (Google Drive, S3, etc.)
- Proactive memory: Tool descriptions guide AI to save preferences, decisions, facts
Quick Start
The recommended way to run this server is via uvx:
uvx mnemo-mcp@latest
Alternatively, you can use
pipx run mnemo-mcp.
Option 1: uvx (Recommended)
{
"mcpServers": {
"mnemo": {
"command": "uvx",
"args": ["mnemo-mcp@latest"],
"env": {
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
// -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
// -- first run downloads ~570MB model, cached for subsequent runs
"API_KEYS": "GOOGLE_API_KEY:AIza...",
// -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
// "EMBEDDING_API_BASE": "https://your-worker.modal.run",
// "EMBEDDING_API_KEY": "your-key",
// -- optional: sync memories across machines via rclone
// -- on first sync, a browser opens for OAuth (auto, no manual setup)
"SYNC_ENABLED": "true", // optional, default: false
"SYNC_INTERVAL": "300" // optional, auto-sync every 5min (0 = manual only)
// "SYNC_REMOTE": "gdrive", // optional, default: gdrive
// "SYNC_PROVIDER": "drive", // optional, default: drive (Google Drive)
}
}
}
}
Option 2: Docker
{
"mcpServers": {
"mnemo": {
"command": "docker",
"args": [
"run", "-i", "--rm",
"--name", "mcp-mnemo",
"-v", "mnemo-data:/data", // persists memories across restarts
"-e", "LITELLM_PROXY_URL", // optional: pass-through from env below
"-e", "LITELLM_PROXY_KEY", // optional: pass-through from env below
"-e", "API_KEYS", // optional: pass-through from env below
"-e", "EMBEDDING_API_BASE", // optional: pass-through from env below
"-e", "EMBEDDING_API_KEY", // optional: pass-through from env below
"-e", "SYNC_ENABLED", // optional: pass-through from env below
"-e", "SYNC_INTERVAL", // optional: pass-through from env below
"n24q02m/mnemo-mcp:latest"
],
"env": {
// -- optional: LiteLLM Proxy (production, selfhosted gateway)
// "LITELLM_PROXY_URL": "http://10.0.0.20:4000",
// "LITELLM_PROXY_KEY": "sk-your-virtual-key",
// -- optional: cloud embedding (Gemini > OpenAI > Cohere) for semantic search
// -- without this, uses built-in local Qwen3-Embedding-0.6B (ONNX, CPU)
"API_KEYS": "GOOGLE_API_KEY:AIza...",
// -- optional: custom embedding endpoint (e.g. modalcom-ai-workers on Modal.com)
// "EMBEDDING_API_BASE": "https://your-worker.modal.run",
// "EMBEDDING_API_KEY": "your-key",
// -- optional: sync memories across machines via rclone
"SYNC_ENABLED": "true", // optional, default: false
"SYNC_INTERVAL": "300" // optional, auto-sync every 5min (0 = manual only)
}
}
}
}
Pre-install (optional)
Pre-download dependencies before adding to your MCP client config. This avoids slow first-run startup:
# Pre-download embedding model (~570MB) and validate API keys
uvx mnemo-mcp warmup
# With cloud embedding (validates API key, skips local download if cloud works)
API_KEYS="GOOGLE_API_KEY:AIza..." uvx mnemo-mcp warmup
Sync setup
Sync is fully automatic. Just set SYNC_ENABLED=true and the server handles everything:
- First sync: rclone is auto-downloaded, a browser opens for OAuth authentication
- Token saved: OAuth token is stored locally at
~/.mnemo-mcp/tokens/(600 permissions) - Subsequent runs: Token is loaded automatically — no manual steps needed
For non-Google Drive providers, set SYNC_PROVIDER and SYNC_REMOTE:
{
"SYNC_ENABLED": "true",
"SYNC_PROVIDER": "dropbox", // rclone provider type
"SYNC_REMOTE": "dropbox" // rclone remote name
}
Advanced: You can also run
uvx mnemo-mcp setup-sync driveto pre-authenticate before first use, but this is optional.
Configuration
| Variable | Default | Description |
|---|---|---|
DB_PATH |
~/.mnemo-mcp/memories.db |
Database location |
LITELLM_PROXY_URL |
— | LiteLLM Proxy URL (e.g. http://10.0.0.20:4000). Enables proxy mode |
LITELLM_PROXY_KEY |
— | LiteLLM Proxy virtual key (e.g. sk-...) |
API_KEYS |
— | API keys (ENV:key,ENV:key). Optional: enables semantic search (SDK mode) |
EMBEDDING_API_BASE |
— | Custom embedding endpoint URL (optional, for SDK mode) |
EMBEDDING_API_KEY |
— | Custom embedding endpoint key (optional) |
EMBEDDING_BACKEND |
(auto-detect) | litellm (cloud API) or local (Qwen3). Auto: API_KEYS -> litellm, else local (always available) |
EMBEDDING_MODEL |
auto-detect | LiteLLM model name (optional) |
EMBEDDING_DIMS |
0 (auto=768) |
Embedding dimensions (0 = auto-detect, default 768) |
SYNC_ENABLED |
false |
Enable rclone sync |
SYNC_PROVIDER |
drive |
rclone provider type (drive, dropbox, s3, etc.) |
SYNC_REMOTE |
gdrive |
rclone remote name |
SYNC_FOLDER |
mnemo-mcp |
Remote folder |
SYNC_INTERVAL |
300 |
Auto-sync seconds (0=manual) |
LOG_LEVEL |
INFO |
Log level |
Embedding (3-Mode Architecture)
Embedding is always available — a local model is built-in and requires no configuration.
Embedding access supports 3 modes, resolved by priority:
| Priority | Mode | Config | Use case |
|---|---|---|---|
| 1 | Proxy | LITELLM_PROXY_URL + LITELLM_PROXY_KEY |
Production (OCI VM, selfhosted gateway) |
| 2 | SDK | API_KEYS or EMBEDDING_API_BASE |
Dev/local with direct API access |
| 3 | Local | Nothing needed | Offline, always available as fallback |
No cross-mode fallback — if proxy is configured but unreachable, calls fail (no silent fallback to direct API).
- Local mode: Qwen3-Embedding-0.6B, always available with zero config.
- GPU auto-detection: If GPU is available (CUDA/DirectML) and
llama-cpp-pythonis installed, automatically uses GGUF model (~480MB) instead of ONNX (~570MB) for better performance. - All embeddings stored at 768 dims (default). Switching providers never breaks the vector table.
- Override with
EMBEDDING_BACKEND=localto force local even with API keys.
API_KEYS supports multiple providers in a single string:
API_KEYS=GOOGLE_API_KEY:AIza...,OPENAI_API_KEY:sk-...,COHERE_API_KEY:co-...
Cloud embedding providers (auto-detected from API_KEYS, priority order):
| Priority | Env Var (LiteLLM) | Model | Native Dims | Stored |
|---|---|---|---|---|
| 1 | GEMINI_API_KEY |
gemini/gemini-embedding-001 |
3072 | 768 |
| 2 | OPENAI_API_KEY |
text-embedding-3-large |
3072 | 768 |
| 3 | COHERE_API_KEY |
embed-multilingual-v3.0 |
1024 | 768 |
All embeddings are truncated to 768 dims (default) for storage. This ensures switching models never breaks the vector table. Override with EMBEDDING_DIMS if needed.
API_KEYS format maps your env var to LiteLLM's expected var (e.g., GOOGLE_API_KEY:key auto-sets GEMINI_API_KEY). Set EMBEDDING_MODEL explicitly for other providers.
MCP Tools
memory — Core memory operations
| Action | Required | Optional |
|---|---|---|
add |
content |
category, tags |
search |
query |
category, tags, limit |
list |
— | category, limit |
update |
memory_id |
content, category, tags |
delete |
memory_id |
— |
export |
— | — |
import |
data (JSONL) |
mode (merge/replace) |
stats |
— | — |
config — Server configuration
| Action | Required | Optional |
|---|---|---|
status |
— | — |
sync |
— | — |
set |
key, value |
— |
help — Full documentation
help(topic="memory") # or "config"
MCP Resources
| URI | Description |
|---|---|
mnemo://stats |
Database statistics and server status |
mnemo://recent |
10 most recently updated memories |
MCP Prompts
| Prompt | Parameters | Description |
|---|---|---|
save_summary |
summary |
Generate prompt to save a conversation summary as memory |
recall_context |
topic |
Generate prompt to recall relevant memories about a topic |
Architecture
MCP Client (Claude, Cursor, etc.)
|
FastMCP Server
/ | \
memory config help
| | |
MemoryDB Settings docs/
/ \
FTS5 sqlite-vec
|
EmbeddingBackend
/ \
LiteLLM Qwen3 ONNX
| (local CPU)
Gemini / OpenAI / Cohere
Sync: rclone (embedded) -> Google Drive / S3 / ...
Development
# Install
uv sync
# Run
uv run mnemo-mcp
# Lint
uv run ruff check src/
uv run ty check src/
# Test
uv run pytest
Compatible With
Also by n24q02m
| Server | Description | Install |
|---|---|---|
| better-notion-mcp | Notion API for AI agents | npx -y @n24q02m/better-notion-mcp@latest |
| wet-mcp | Web search, content extraction, library docs | uvx --python 3.13 wet-mcp@latest |
| better-email-mcp | Email (IMAP/SMTP) for AI agents | npx -y @n24q02m/better-email-mcp@latest |
| better-godot-mcp | Godot Engine for AI agents | npx -y @n24q02m/better-godot-mcp@latest |
Related Projects
- modalcom-ai-workers — GPU-accelerated AI workers on Modal.com (embedding, reranking)
- qwen3-embed — Local embedding/reranking library used by mnemo-mcp
Contributing
See CONTRIBUTING.md
License
MIT - See LICENSE