Retrieval
Remind uses spreading activation for retrieval — a biologically-inspired algorithm that goes beyond keyword or vector matching. When you query Remind, it doesn't just find the closest embeddings. It activates a network of related concepts through the knowledge graph.
How spreading activation works
Query: "What authentication approach should we use?"
1. EMBED → Query is converted to a dense vector
2. MATCH → Concepts with similar embeddings activate (via native
vector index when available, or brute-force fallback)
"JWT auth middleware" (0.89)
"Auth module architecture" (0.82)
"Password hashing with bcrypt" (0.75)
2b. FUSE → Embedding score is blended with keyword overlap
score = (1 - keyword_weight) × embedding + keyword_weight × keyword
(configurable via hybrid_keyword_weight, default 0.3)
3. SPREAD → Activated concepts propagate through relations
"JWT auth middleware"
→ implies: "Need token refresh strategy" (0.71)
→ part_of: "Auth system architecture" (0.66)
"Auth module architecture"
→ implies: "Rate limiting on auth endpoints" (0.58)
4. DECAY → Activation reduces with each hop
Hop 1: activation × relation_strength × 0.5
Hop 2: activation × relation_strength × 0.25
5. RETURN → Highest-activation concepts are returnedThis is fundamentally different from RAG. You don't get "the 5 most similar documents." You get a network of related knowledge that spreads from the query through the concept graph.
Why spreading activation?
Consider how human memory works. You don't search for "that restaurant" by scanning every memory. You think about food, which activates that conversation about cooking, which activates that restaurant, which activates who you were with.
Spreading activation retrieves concepts you didn't directly query for but are meaningfully connected. Asking about "auth" might activate "rate limiting" (which is part of the auth system) even though "rate limiting" has no embedding similarity to "auth."
Hybrid keyword scoring
By default, retrieval fuses embedding similarity with keyword overlap (controlled by hybrid_keyword_weight, default 0.3). This helps surface concepts that contain exact query terms which embeddings alone might miss — for instance, specific tool names, config keys, or technical terms.
The formula is:
score = (1 - weight) × embedding_similarity + weight × keyword_overlapWhere keyword_overlap is the fraction of query tokens (2+ chars) found in the concept's title and summary. Set hybrid_keyword_weight to 0.0 for pure embedding search or 1.0 for pure keyword matching.
Configure via ~/.remind/remind.config.json:
{
"hybrid_keyword_weight": 0.3
}Or environment variable: REMIND_HYBRID_KEYWORD_WEIGHT=0.3
Reranking
Optionally, retrieval can apply a cross-encoder reranker after spreading activation. While embedding similarity and keyword overlap are fast, they can miss nuanced relevance — a cross-encoder reads the query and each candidate together, producing a more accurate relevance score.
When enabled, reranking is applied in two places:
- Concept retrieval — After spreading activation and entity matching produce a candidate set, the reranker scores each concept's title + summary against the query. The final activation is blended:
0.4 × activation + 0.6 × rerank_score. - Direct episode search — Episode results are similarly rescored using episode content.
This preserves graph structure signal (spreading activation, entity links) while letting the cross-encoder correct false positives and surface better matches.
Setup
Install the reranking extra:
pip install "remind-mcp[rerank]"Enable in config (~/.remind/remind.config.json):
{
"reranking_enabled": true,
"reranking_model": "cross-encoder/ms-marco-MiniLM-L-6-v2"
}Or via environment variables:
REMIND_RERANKING_ENABLED=true
REMIND_RERANKING_MODEL=cross-encoder/ms-marco-MiniLM-L-6-v2The default model (ms-marco-MiniLM-L-6-v2) is ~88MB on disk, typically loads in ~0.3-1.0s from cache, and scores 30 candidates in ~20-30ms on CPU. Hardware acceleration (CUDA, MPS) is auto-detected.
Runtime behavior: CLI vs MCP
- MCP server is long-lived, so reranker model load is amortized naturally across many recall calls.
- CLI (
remind recall) is normally one-shot, but whenreranking_enabled=trueit transparently routes requests through a persistent local recall worker so reranker load is reused across calls. - The CLI worker exits after
cli_recall_worker_idle_secondsof inactivity (default: 600s). - If the worker is unavailable, CLI falls back to one-shot recall so commands remain functional.
Tuning with recall_initial_candidates
The recall_initial_candidates setting controls how many concepts are fetched from the vector index before spreading activation and reranking. The default is 10 (which queries the DB for 10 × 2 = 20 raw candidates before filtering).
When reranking is enabled, increasing this value gives the reranker more candidates to evaluate, which can improve recall quality at the cost of slightly higher latency. A value of 15-20 is a good starting point:
{
"reranking_enabled": true,
"recall_initial_candidates": 15
}Without reranking, the default of 10 is generally sufficient since the spreading activation and entity matching steps already broaden the candidate pool.
Vector indexes
Remind uses native database vector indexes when available, replacing the default brute-force Python cosine similarity:
| Backend | Extension | How it works |
|---|---|---|
| SQLite | sqlite-vec | vec0 virtual tables with cosine distance KNN. The package is installed with Remind; see below for when it can actually load. |
| PostgreSQL | pgvector | vector(N) columns with HNSW indexes. Installed with pip install "remind-mcp[postgres]". |
| Fallback | — | NumPy brute-force cosine similarity (O(n) per query). |
Vector tables are created automatically on first embedding write. No manual setup is needed — Remind detects the backend at startup and chooses the best available path.
SQLite (sqlite-vec) requirements
sqlite-vec loads as a SQLite extension. Your Python sqlite3 connection must support enable_load_extension. Some builds (notably macOS interpreters linked against a minimal system SQLite) do not — then Remind skips sqlite-vec and uses brute-force similarity; recall still works.
To use native SQLite vector search, run Remind under a Python built with loadable SQLite extensions, typically by linking against Homebrew’s SQLite when installing Python (e.g. pyenv with PYTHON_CONFIGURE_OPTS=--enable-loadable-sqlite-extensions and LDFLAGS/CPPFLAGS pointing at brew --prefix sqlite). Verify with:
python -c "import sqlite3; c=sqlite3.connect(':memory:'); print(hasattr(c, 'enable_load_extension'))"Full step-by-step notes are in Configuration — Vector search.
Entity-based retrieval
Retrieval has two entity-based signals:
Entity embedding search
Entities are embedded on creation (using their type and display name). During retrieval, the query embedding is compared against entity embeddings in parallel with concept and episode searches. Concepts linked to high-similarity entities receive a boost — this catches concepts that may have low direct embedding similarity to the query but are linked to entities that match well.
For example, querying "session storage" might not match a concept about "Redis configuration" directly, but if tool:redis has high embedding similarity to the query, concepts linked to that entity get boosted.
Entity name matching
Words in your query are matched against entity names and IDs — if the query mentions "redis", concepts linked to the tool:redis entity are activated directly. This provides a fast, embedding-free signal that complements the semantic search.
Direct episode search
In addition to concept-based spreading activation, recall can search episodes directly by embedding similarity. When episode_k is set (default: 5), the query embedding is compared against episode embeddings to find fine-grained matches that may not yet be consolidated into concepts.
Direct episode results appear first in the recall output under a RELEVANT EPISODES heading, followed by the concept-based RELEVANT MEMORY section.
Use --episode-k 0 (CLI) or episode_k=0 (Python/MCP) to disable direct episode search and use only concept-level retrieval.
Topic-scoped retrieval
When a topic is provided to recall, initial concept matches are filtered to that topic (plus untopiced general concepts). Cross-topic concepts can still surface through spreading activation, but receive a 0.4x penalty — they need stronger relation evidence to appear in results.
This reduces noise when querying a specific knowledge area while still allowing relevant cross-domain insights to surface.
Contradiction and supersession display
Each concept in recall output shows inbound and outbound contradictions — concepts that have a contradicts relation to or from the current concept. This surfaces conflicting knowledge directly in the retrieval context, so the consumer can see tensions without needing to inspect relations manually.
Similarly, supersedes relations are surfaced explicitly:
→ supersedes [old_id]: <old summary>— this concept replaces an older one→ SUPERSEDED BY [new_id]: <new summary>— this concept has been replaced
This acts as a staleness signal: if a retrieved concept has been superseded, the consumer knows to prefer the newer version.
Parameters
The retrieval algorithm has a few key parameters:
- k — Maximum number of concepts to return (default: 3)
- episode_k — Number of episodes to retrieve via direct embedding search (default: 5). Set to 0 to disable.
- recall_initial_candidates — How many initial embedding candidates to fetch before spreading activation and reranking (default: 10). The database actually queries for
recall_initial_candidates × 2raw results, then filters by activation threshold. Increase when using reranking to give the cross-encoder more candidates to evaluate. - min_activation — Minimum activation score to include in results (default: 0.15). Concepts below this floor are dropped even if there's budget remaining. This prevents low-relevance noise from reaching the context.
- Initial activation threshold — Minimum embedding similarity to activate a concept
- Spread decay — How much activation reduces per hop (default: 0.5 per hop)
- Spread depth — Number of hops to propagate (default: 2)
Decay factor
Each concept has a decay_factor (0.0–1.0) that multiplies its activation score during retrieval. This implements memory decay — concepts that haven't been recalled recently rank lower. Frequently-recalled concepts maintain high decay factors.
Using recall
remind recall "authentication approach"
remind recall "auth" --entity module:auth # Entity-scoped
remind recall --entity module:auth # Entity-only (no query needed)
remind recall "performance" -k 10 # More results
remind recall "auth" --episode-k 10 # More direct episode matches
remind recall "auth" --episode-k 0 # Concepts only, no episodes
remind recall "database design" --topic architecture # Topic-scopedcontext = await memory.recall("authentication approach")
context = await memory.recall("auth", entity="module:auth")
context = await memory.recall(entity="module:auth") # Entity-only
context = await memory.recall("auth", episode_k=10) # More episode matches
context = await memory.recall("auth", episode_k=0) # Concepts only
context = await memory.recall("database design", topic="architecture") # Topic-scopedrecall(query="authentication approach")
recall(query="auth", entity="module:auth")
recall(entity="module:auth")
recall(query="auth", episode_k=10)
recall(query="auth", episode_k=0)
recall(query="database design", topic="architecture")