io.github.n24q02m/imagine-mcp icon

imagine-mcp

by N24q02m

io.github.n24q02m/imagine-mcp

MCP server for image/video understanding & generation (Gemini/OpenAI/Grok)

imagine-mcp

mcp-name: io.github.n24q02m/imagine-mcp

Image and video understanding + generation for AI agents -- across Gemini, OpenAI, and Grok.

CI
codecov

Sister projects from n24q02m (click to expand)
Project Tagline Tag
better-code-review-graph Knowledge graph for token-efficient code reviews -- semantic search and call-... MCP
better-email-mcp IMAP/SMTP email for AI agents -- read, send, organize folders, and manage att... MCP
better-godot-mcp Composite MCP server for Godot Engine -- 17 composite tools for AI-assisted g... MCP
better-notion-mcp Markdown-first Notion for AI agents -- pages, databases, blocks, and comments... MCP
better-telegram-mcp Telegram for AI agents -- messages, chats, media, and contacts across both bo... MCP
claude-plugins Claude Code plugin marketplace for the n24q02m MCP servers -- install web sea... Marketplace
imagine-mcp Image and video understanding + generation for AI agents -- across Gemini, Op... MCP
jules-task-archiver Chrome Extension for bulk operations on Jules tasks via batchexecute API -- a... Tooling
mcp-core Shared foundation for building MCP servers -- Streamable HTTP transport, OAut... MCP
mnemo-mcp Persistent AI memory with hybrid search and embedded sync. Open, free, unlimi... MCP
qwen3-embed Lightweight Qwen3 text embedding and reranking via ONNX Runtime and GGUF Library
skret Secrets without the server. CLI
tacet TACET: a self-distilling neuro-symbolic cascade that amortises LLM cost in kn... Tooling
web-core Shared web infrastructure package for search, scraping, HTTP security, and st... Library
wet-mcp Open-source MCP server for AI agents: web search, content extraction, and lib... MCP

Table of contents

Features

  • Multimodal understanding -- Describe, classify, or reason over images and videos (Gemini handles mixed image + video in one call)
  • Image generation -- Text-to-image and image-to-image (edit / inpaint) across Gemini Imagen, OpenAI gpt-image, Grok Imagine
  • Video generation -- Text-to-video and image-to-video (Gemini Veo 3.1, Grok Imagine Video)
  • 3 providers x 2 tiers -- Same interface for gemini / openai / grok at poor (cheap/fast) or rich (high quality); swap via parameter
  • Leaderboard-ranked models -- Provider ordering auto-refreshed weekly from Artificial Analysis + LMArena leaderboards
  • Degraded mode -- Server starts with zero credentials and surfaces remaining providers as you add keys
  • Response cache -- Disk-based caching of understand responses with configurable TTL
  • Dual transport -- pure stdio with provider env vars (default) or HTTP multi-user with paste-token relay form

Status

2026-05-02 -- Architecture stabilization update

Past months saw significant churn around credential handling and the daemon-bridge auto-spawn pattern. This caused multi-process races, browser tab spam, and inconsistent setup UX across plugins. The architecture is now stable: 2 clean modes (stdio + HTTP), no daemon-bridge layer, no auto-spawn from stdio.

Apologies for the instability period. If you encountered issues with prior versions, please update to the latest release and follow the current Setup docs -- most prior workarounds are no longer needed.

Related plugins from the same author:

All plugins share the same architecture -- install once, learn pattern transfers.

Documentation

Full docs at mcp.n24q02m.com/servers/imagine-mcp/setup/:

  • Setup -- install methods for Claude Code, Codex, Gemini CLI, Cursor, Windsurf, mcp.json
  • Modes overview -- stdio / local-relay / remote-relay / remote-oauth
  • Multi-user setup -- per-JWT-sub credential model

Install with AI agent -- paste this to your AI coding agent:

Install MCP server imagine-mcp following the steps at
https://raw.githubusercontent.com/n24q02m/claude-plugins/main/plugins/imagine-mcp/setup-with-agent.md

Tools

Tool Actions Description
understand -- Describe or reason over one or more image/video URLs. media_urls: list[str], prompt: str, provider, tier, max_tokens.
generate -- Generate an image or video from a text prompt. media_type: image|video, optional reference_image_url, optional job_id (video poll), aspect_ratio, duration_seconds.
config open_relay, relay_status, relay_skip, relay_reset, relay_complete, warmup, status, set, cache_clear Credential + runtime config: open relay form, check credential state, set runtime knobs (log level, default provider, TTL), clear response cache.
help -- Full Markdown documentation for understand, generate, or config topics.
config__open_relay -- Framework-injected helper (mcp-core) equivalent to config(action="open_relay"); opens the browser credential form.

Model IDs per provider x action x tier are leaderboard-ranked; see docs/models.md (auto-regenerated from src/imagine_mcp/models.py).

Comparison

How imagine-mcp stacks up against direct competitors in each pillar:

Capability imagine-mcp EverArt MCP fal.ai MCP Replicate Flux MCP
Image/video understanding Yes (describe / classify / reason over image + video URLs) No No No
Image generation Yes (text-to-image + image-to-image via reference_image_url) Yes (single generate_image) Yes (text/image-to-image, edit, inpaint) Yes (single generate_image)
Video generation Yes (text-to-video + image-to-video, async job_id poll) No Yes (text/image-to-video) No
Multi-provider backends Yes (Gemini / OpenAI / Grok, auto-fallback) No (EverArt only) No (fal.ai only) No (Replicate Flux only)
Quality/cost tiers Yes (poor cheap-fast vs rich high-quality per provider) No No No
Self-hostable / open source Yes (MIT, stdio + HTTP self-host) Yes (MIT, archived) Yes (MIT) Yes (MIT, archived)

Security

  • SSRF + LFI prevention -- All media_urls and reference_image_url are validated at the dispatch boundary; only http:// and https:// schemes reach the providers. file://, ftp://, gopher://, and scheme-less URLs are rejected.
  • No credentials in errors -- Provider-side errors are sanitized before being returned.
  • Degraded start -- Missing credentials do not prevent the server from starting; affected actions surface actionable errors instead of crashing at boot.
  • Credential storage -- Credentials submitted through the browser credential form are stored encrypted via mcp-core (AES-GCM, machine-bound key) at ~/.imagine-mcp/config.json.

Build from Source

git clone https://github.com/n24q02m/imagine-mcp.git
cd imagine-mcp
mise run setup      # or: uv sync --group dev
mise run dev        # run http local relay daemon

Trust Model

This plugin implements TC-Local (machine-bound, single trust principal). See mcp-core trust model for full classification.

Mode Storage Encryption Who can read your data?
stdio (default) ~/.imagine-mcp/config.json AES-GCM, machine-bound key Only your OS user (file perm 0600)
HTTP self-host Same as stdio Same Only you (admin = user)

Contributing

See CONTRIBUTING.md for the full development workflow, commit convention, and release process. Issues + Discussions welcome.

License

MIT -- see LICENSE.