I built Claude Cognitive because Claude Code kept forgetting my codebase between sessions.
The problem: Claude Code is stateless. Every new instance rediscovers your architecture from scratch, hallucinates integrations that don't exist, repeats debugging you already tried, and burns tokens re-reading unchanged files.
At 1M+ lines of Python (3,400 modules across a distributed system), this was killing my productivity.
The solution is two systems:
1. Context Router – Attention-based file injection. Files get HOT/WARM/COLD scores based on recency and keyword activation. HOT files inject fully, WARM files inject headers only, COLD files evict. Files decay over turns, co-activate with related files. Result: 64-95% token reduction.
2. Pool Coordinator – Multi-instance state sharing. Running 8 concurrent Claude Code instances, they now share completions and blockers. No duplicate debugging, no stepping on each other.
Results after months of daily use:
- New instances productive on first message
- Zero hallucinated imports
- Token usage down 70-80% average
- Works across multi-day sessions
Open source (MIT). Works with Claude Code today via hooks.
CLAUDE.md is static same content every session (and a soft 40k character limit)
This is dynamic attention routing. Files get scored based on what you're actively discussing. mention "auth" and auth-related docs go HOT (full injection), related files go WARM (headers only), unrelated files stay COLD (evicted).
Scores decay over turns. If you stop talking about auth, those files fade back to COLD automatically.
Plus multi-instance coordination, concurrent Claude sessions sharing completions and blockers so they don't duplicate work.
25k character limit on injection, so you compact less and stay focused where its needed. Ive also seen it help alot with the post compacting context wobble that occurs.
The problem: Claude Code is stateless. Every new instance rediscovers your architecture from scratch, hallucinates integrations that don't exist, repeats debugging you already tried, and burns tokens re-reading unchanged files.
At 1M+ lines of Python (3,400 modules across a distributed system), this was killing my productivity.
The solution is two systems:
1. Context Router – Attention-based file injection. Files get HOT/WARM/COLD scores based on recency and keyword activation. HOT files inject fully, WARM files inject headers only, COLD files evict. Files decay over turns, co-activate with related files. Result: 64-95% token reduction.
2. Pool Coordinator – Multi-instance state sharing. Running 8 concurrent Claude Code instances, they now share completions and blockers. No duplicate debugging, no stepping on each other.
Results after months of daily use: - New instances productive on first message - Zero hallucinated imports - Token usage down 70-80% average - Works across multi-day sessions
Open source (MIT). Works with Claude Code today via hooks.
GitHub: https://github.com/GMaN1911/claude-cognitive
Happy to answer questions about the architecture or implementation details.
reply