CONTEXT RELEVANCE
Context Relevance Pre-Parser
Overview
The Context Relevance Pre-Parser is a deterministic, rule-based service that optimizes context injection by selectively excluding irrelevant context nodes based on the current cognitive mode and conversation signals. This reduces unnecessary I/O, token usage, and latency without sacrificing response quality.
Motivation
Previously, every response generation retrieved and injected ALL context nodes (episodic memory, identity, user traits, facts, gists, focus, tools, skills, etc.) into every prompt — regardless of whether the mode-specific template even referenced them.
Example Waste
An ACKNOWLEDGE for “Hey!” would trigger:
- PostgreSQL vector search for episodic memory
- Redis reads for facts, gists, working memory
- Skill registry queries
None of which the ACKNOWLEDGE template even uses.
Expected Savings
| Mode | I/O Skipped | Token Savings |
|---|---|---|
| ACKNOWLEDGE | 5 Redis reads, 1 PG vector search, skill queries | ~1500-3000 |
| CLARIFY (warm) | 1 PG vector search, skill queries | ~500-1500 |
| RESPOND (greeting) | 1 PG vector search, focus queries | ~800-2000 |
| ACT | Identity/trait lookups | ~300-800 |
Pre-parser execution: < 0.5ms (pure dict lookups).
Architecture
Seven-Layer Pipeline
The service applies seven layers in order, each gating context node inclusion:
- Template Masks — Static per-mode. Excludes nodes the template doesn’t reference (hard exclusion).
- Signal Rules — Conditional. Excludes nodes that the template references but are irrelevant given current signals. Each rule specifies
strength: "hard"or"soft". - Urgency Overrides — When
classification.urgency == 'high', force-includeworking_memory,world_state,factsfor broader awareness. - Soft Exclusion Recovery — Soft-excluded nodes get re-included if token budget has headroom, in configurable priority order.
- Dependency Resolution — If a node is included, its dependencies are auto-included (e.g., including
episodic_memoryauto-includesgists). - Safety Overrides — Force-includes nodes that must always be present under certain conditions (e.g., identity when returning from silence).
- Safeguard —
MAX_INCLUDED_NODES = 12. Logs warning if exceeded to protect prompt integrity.
Context Nodes
All context nodes supported:
identity_contextonboarding_nudgeuser_traitscommunication_styleactive_listsclient_contextfocusworking_memoryfactsgistsepisodic_memoryact_historyavailable_skillsavailable_toolsworld_statewarm_return_hintidentity_modulation
Configuration
File Location
backend/configs/agents/context-relevance.json
Configuration Structure
Template Masks
Static per-mode inclusion decisions. Excludes nodes the template doesn’t even reference.
{
"template_masks": {
"RESPOND": {
"episodic_memory": true,
"working_memory": true,
"facts": true,
...
},
"ACKNOWLEDGE": {
"episodic_memory": false,
"working_memory": false,
...
}
}
}
Signal Rules
Conditional exclusion rules with strength levels:
{
"signal_rules": {
"episodic_memory": [
{
"when": {
"context_warmth_gte": 0.5,
"working_memory_turns_gte": 2
},
"strength": "soft"
},
{
"when": {
"greeting_pattern": true,
"prompt_token_count_lt": 6
},
"strength": "hard"
}
]
}
}
Predicates:
- Exact match:
"key": value - Comparisons:
"key_gte": threshold,"key_gt","key_lte","key_lt","key_eq" - Special:
"returning_from_silence": true/false
Strengths:
"hard"— Never recovered, even with budget"soft"— Recoverable if token budget has headroom
Dependencies
Dependency graph; if a child is included, parents auto-include:
{
"dependencies": {
"episodic_memory": ["gists"],
"available_tools": ["available_skills"],
"onboarding_nudge": ["identity_context"],
"warm_return_hint": ["identity_context"]
}
}
Urgency Overrides
Force-include critical nodes when urgent:
{
"urgency_overrides": ["working_memory", "world_state", "facts"]
}
Safety Overrides
Force-include under specific conditions:
{
"safety_overrides": {
"identity_context": [
{ "when": { "returning_from_silence": true } },
{ "when": { "context_warmth_lt": 0.3 } }
],
"working_memory": [
{ "when": { "working_memory_turns_gte": 1 } }
]
}
}
Recovery Parameters
soft_recovery_budget(default: 1500) — Token headroom threshold for re-including soft-excluded nodessoft_recovery_priority(default: listed order) — Priority order for soft recovery
{
"soft_recovery_budget": 1500,
"soft_recovery_priority": [
"episodic_memory", "working_memory", "world_state", "facts",
"active_lists", "focus", "gists"
]
}
Usage
In Digest Worker
The service is invoked in digest_worker.py before response generation:
from services.context_relevance_service import ContextRelevanceService
# Compute inclusion map
context_relevance_service = ContextRelevanceService()
inclusion_map = context_relevance_service.compute_inclusion_map(
mode='RESPOND', # cognitive mode
signals=signals, # routing signals
classification=classification, # topic classification
returning_from_silence=returning_from_silence,
token_budget_remaining=4000 # estimated tokens left
)
# Pass to cortex service
response_data = cortex_service.generate_response(
system_prompt_template=prompt,
original_prompt=text,
classification=classification,
chat_history=chat_history,
inclusion_map=inclusion_map, # ← KEY: gates context retrieval
...
)
In Frontal Cortex Service
The generate_response() and _inject_parameters() methods gate context retrieval based on inclusion_map:
def _inject_parameters(self, template, ..., inclusion_map=None):
_include = lambda node: (inclusion_map or {}).get(node, True)
# Only submit futures for included nodes
if _include('gists'):
futures[executor.submit(...)] = 'gists'
if _include('episodic_memory'):
futures[executor.submit(...)] = 'episodes'
...
# Only inject placeholders for included nodes
result = result.replace('', episodic_context if _include('episodic_memory') else '')
result = result.replace('', facts_context if _include('facts') else '')
...
Backward Compatibility: inclusion_map=None defaults to include everything (current behavior).
Observability
Structured Logging
Every context relevance computation logs a structured entry:
[CONTEXT RELEVANCE] mode=CLARIFY | excluded_hard=[focus, available_skills, available_tools, warm_return_hint] |
excluded_soft=[episodic_memory] | recovered_soft=[] | deps_added=[] |
overrides_applied=[urgency] | total_included=9 | est_tokens=2100
Fields:
mode— Cognitive modeexcluded_hard— Hard-excluded nodes (never recovered)excluded_soft— Soft-excluded nodes (recoverable)recovered_soft— Soft-excluded nodes that were recovered due to budgetdeps_added— Dependencies auto-includedoverrides_applied— Overrides applied (urgency, safety)total_included— Total included nodesest_tokens— Estimated tokens for included nodes
Warnings
- MAX_INCLUDED_NODES exceeded: Logs warning if total included nodes > 12
- Circular dependencies: Raises
ConfigErrorat config load time - Config load failure: Falls back to “include all” with warning
Testing
Comprehensive unit tests cover:
- Template mask correctness per mode
- Signal rule triggers (warm clarify excludes episodic, greeting excludes episodic)
- Soft vs hard exclusion behavior
- Dependency graph resolution with circular detection
- Soft recovery priority ordering
- Urgency overrides (force-include when urgent)
- Safety overrides (returning_from_silence forces identity)
- MAX_INCLUDED_NODES safeguard
- Edge cases (unknown mode, missing signals, disabled config)
Run tests:
pytest backend/tests/test_context_relevance_service.py -v
Configuration Tuning
Mode-Specific Optimization
Adjust template masks per mode to match your mode-specific templates:
{
"template_masks": {
"CUSTOM_MODE": {
"episodic_memory": false,
"facts": true,
...
}
}
}
Signal-Driven Exclusion
Add new signal rules to exclude context for specific conversation patterns:
{
"signal_rules": {
"focus": [
{
"when": { "greeting_pattern": true },
"strength": "soft"
}
]
}
}
Budget-Aware Recovery
Adjust soft recovery budget based on token model limits:
{
"soft_recovery_budget": 2000 // Increase headroom for lower-token models
}
Custom Dependencies
Define new dependency relationships:
{
"dependencies": {
"new_node": ["existing_node"]
}
}
Implementation Details
Service Class
backend/services/context_relevance_service.pyContextRelevanceService— Main service classcompute_inclusion_map()— Core method (returns{node: True/False})
Config File
backend/configs/agents/context-relevance.json— Configuration
Integration Points
backend/workers/digest_worker.py— Calls service beforegenerate_for_mode()backend/services/frontal_cortex_service.py— Usesinclusion_mapin_inject_parameters()
Disabling the Feature
To disable context relevance pre-parsing entirely, set in config:
{
"enabled": false
}
All context nodes will be included (current behavior). Useful for debugging or when minimal optimization is needed.
Future Enhancements
- Machine learning-based rules — Learn signal-to-exclusion mappings from interaction data
- Per-user config — Different rules per user based on communication patterns
- Dynamic token budget — Estimate remaining tokens from prompt + mode
- A/B testing framework — Compare responses with/without context relevance pre-parsing