MCP server for image/video understanding & generation (Gemini/OpenAI/Grok)
MCP server for image/video understanding & generation (Gemini/OpenAI/Grok)
imagine-mcp · v1.3.0
by N24q02m
imagine-mcp
mcp-name: io.github.n24q02m/imagine-mcp
Image and video understanding + generation for AI agents -- across Gemini, OpenAI, and Grok.
Sister projects from n24q02m (click to expand)
| Project | Tagline | Tag |
|---|---|---|
| better-code-review-graph | Knowledge graph for token-efficient code reviews -- semantic search and call-... | MCP |
| better-email-mcp | IMAP/SMTP email for AI agents -- read, send, organize folders, and manage att... | MCP |
| better-godot-mcp | Composite MCP server for Godot Engine -- 17 composite tools for AI-assisted g... | MCP |
| better-notion-mcp | Markdown-first Notion for AI agents -- pages, databases, blocks, and comments... | MCP |
| better-telegram-mcp | Telegram for AI agents -- messages, chats, media, and contacts across both bo... | MCP |
| claude-plugins | Claude Code plugin marketplace for the n24q02m MCP servers -- install web sea... | Marketplace |
| imagine-mcp | Image and video understanding + generation for AI agents -- across Gemini, Op... | MCP |
| jules-task-archiver | Chrome Extension for bulk operations on Jules tasks via batchexecute API -- a... | Tooling |
| mcp-core | Shared foundation for building MCP servers -- Streamable HTTP transport, OAut... | MCP |
| mnemo-mcp | Persistent AI memory with hybrid search and embedded sync. Open, free, unlimi... | MCP |
| qwen3-embed | Lightweight Qwen3 text embedding and reranking via ONNX Runtime and GGUF | Library |
| skret | Secrets without the server. | CLI |
| tacet | TACET: a self-distilling neuro-symbolic cascade that amortises LLM cost in kn... | Tooling |
| web-core | Shared web infrastructure package for search, scraping, HTTP security, and st... | Library |
| wet-mcp | Open-source MCP server for AI agents: web search, content extraction, and lib... | MCP |
Table of contents
- Features
- Status
- Documentation
- Tools
- Comparison
- Security
- Build from Source
- Trust Model
- Contributing
- License
Features
- Multimodal understanding -- Describe, classify, or reason over images and videos (Gemini handles mixed image + video in one call)
- Image generation -- Text-to-image and image-to-image (edit / inpaint) across Gemini Imagen, OpenAI gpt-image, Grok Imagine
- Video generation -- Text-to-video and image-to-video (Gemini Veo 3.1, Grok Imagine Video)
- 3 providers x 2 tiers -- Same interface for
gemini/openai/grokatpoor(cheap/fast) orrich(high quality); swap via parameter - Leaderboard-ranked models -- Provider ordering auto-refreshed weekly from Artificial Analysis + LMArena leaderboards
- Degraded mode -- Server starts with zero credentials and surfaces remaining providers as you add keys
- Response cache -- Disk-based caching of
understandresponses with configurable TTL - Dual transport -- pure stdio with provider env vars (default) or HTTP multi-user with paste-token relay form
Status
2026-05-02 -- Architecture stabilization update
Past months saw significant churn around credential handling and the daemon-bridge auto-spawn pattern. This caused multi-process races, browser tab spam, and inconsistent setup UX across plugins. The architecture is now stable: 2 clean modes (stdio + HTTP), no daemon-bridge layer, no auto-spawn from stdio.
Apologies for the instability period. If you encountered issues with prior versions, please update to the latest release and follow the current Setup docs -- most prior workarounds are no longer needed.
Related plugins from the same author:
- wet-mcp -- Web search + content extraction
- mnemo-mcp -- Persistent AI memory
- better-notion-mcp -- Notion API
- better-email-mcp -- Email management
- better-telegram-mcp -- Telegram
- better-godot-mcp -- Godot Engine
- better-code-review-graph -- Code review knowledge graph
All plugins share the same architecture -- install once, learn pattern transfers.
Documentation
Full docs at mcp.n24q02m.com/servers/imagine-mcp/setup/:
- Setup -- install methods for Claude Code, Codex, Gemini CLI, Cursor, Windsurf, mcp.json
- Modes overview -- stdio / local-relay / remote-relay / remote-oauth
- Multi-user setup -- per-JWT-sub credential model
Install with AI agent -- paste this to your AI coding agent:
Install MCP server
imagine-mcpfollowing the steps at
https://raw.githubusercontent.com/n24q02m/claude-plugins/main/plugins/imagine-mcp/setup-with-agent.md
Tools
| Tool | Actions | Description |
|---|---|---|
understand |
-- | Describe or reason over one or more image/video URLs. media_urls: list[str], prompt: str, provider, tier, max_tokens. |
generate |
-- | Generate an image or video from a text prompt. media_type: image|video, optional reference_image_url, optional job_id (video poll), aspect_ratio, duration_seconds. |
config |
open_relay, relay_status, relay_skip, relay_reset, relay_complete, warmup, status, set, cache_clear |
Credential + runtime config: open relay form, check credential state, set runtime knobs (log level, default provider, TTL), clear response cache. |
help |
-- | Full Markdown documentation for understand, generate, or config topics. |
config__open_relay |
-- | Framework-injected helper (mcp-core) equivalent to config(action="open_relay"); opens the browser credential form. |
Model IDs per provider x action x tier are leaderboard-ranked; see docs/models.md (auto-regenerated from src/imagine_mcp/models.py).
Comparison
How imagine-mcp stacks up against direct competitors in each pillar:
| Capability | imagine-mcp | EverArt MCP | fal.ai MCP | Replicate Flux MCP |
|---|---|---|---|---|
| Image/video understanding | Yes (describe / classify / reason over image + video URLs) | No | No | No |
| Image generation | Yes (text-to-image + image-to-image via reference_image_url) |
Yes (single generate_image) |
Yes (text/image-to-image, edit, inpaint) | Yes (single generate_image) |
| Video generation | Yes (text-to-video + image-to-video, async job_id poll) |
No | Yes (text/image-to-video) | No |
| Multi-provider backends | Yes (Gemini / OpenAI / Grok, auto-fallback) | No (EverArt only) | No (fal.ai only) | No (Replicate Flux only) |
| Quality/cost tiers | Yes (poor cheap-fast vs rich high-quality per provider) |
No | No | No |
| Self-hostable / open source | Yes (MIT, stdio + HTTP self-host) | Yes (MIT, archived) | Yes (MIT) | Yes (MIT, archived) |
Security
- SSRF + LFI prevention -- All
media_urlsandreference_image_urlare validated at the dispatch boundary; onlyhttp://andhttps://schemes reach the providers.file://,ftp://,gopher://, and scheme-less URLs are rejected. - No credentials in errors -- Provider-side errors are sanitized before being returned.
- Degraded start -- Missing credentials do not prevent the server from starting; affected actions surface actionable errors instead of crashing at boot.
- Credential storage -- Credentials submitted through the browser credential form are stored encrypted via
mcp-core(AES-GCM, machine-bound key) at~/.imagine-mcp/config.json.
Build from Source
git clone https://github.com/n24q02m/imagine-mcp.git
cd imagine-mcp
mise run setup # or: uv sync --group dev
mise run dev # run http local relay daemon
Trust Model
This plugin implements TC-Local (machine-bound, single trust principal). See mcp-core trust model for full classification.
| Mode | Storage | Encryption | Who can read your data? |
|---|---|---|---|
| stdio (default) | ~/.imagine-mcp/config.json |
AES-GCM, machine-bound key | Only your OS user (file perm 0600) |
| HTTP self-host | Same as stdio | Same | Only you (admin = user) |
Contributing
See CONTRIBUTING.md for the full development workflow, commit convention, and release process. Issues + Discussions welcome.
License
MIT -- see LICENSE.