elvismdev's comments

elvismdev · 2026-02-19T19:37:40 1771529860

Author here. I built this because Claude Code's built-in memory (CLAUDE.md, Auto Memory, Session Memory) works well for static rules and compressed summaries, but loses the detailed reasoning behind decisions between sessions.

The server uses mem0ai as a library, Qdrant for vector storage, and Ollama for local embeddings (bge-m3, 1024 dims). Optional Neo4j adds a knowledge graph. Everything including the LLM fact extraction step can run locally.

A few engineering decisions worth mentioning:

- Zero-config auth: reads your existing OAT token from ~/.claude/.credentials.json instead of requiring a separate API key. 3-tier fallback chain (env var → credentials file → API key).

- If you enable the graph layer, each add_memory triggers 3 extra LLM calls. To avoid burning your Claude subscription quota on entity extraction, you can route those to a local Ollama model (Qwen3:14b at Q4_K_M gets 0.971 tool-calling F1).

I patched mem0 upstream to support OAT token reuse (PR #4035). Their official MCP server is cloud-only, which is why I built this local version.

Happy to discuss the architecture or tradeoffs.

Full writeup with setup guide: https://dev.to/n3rdh4ck3r/how-to-give-claude-code-persistent...