June 11, 2026
Memory redesign + ToolResult contract sweep
A major memory subsystem overhaul landed in three coordinated pieces
A major memory subsystem overhaul landed in three coordinated pieces. The schema foundation (TKT-920) adds hierarchy levels, last_relevant_at, tombstoned_at, and facts_extracted_at to episodes, plus bi-temporal valid_from/valid_to on data_graph with a live-facts partial index, and a SchemaConvergenceService backfill step that is idempotent and NULL-guarded. On top of that, episodic decay (TKT-921) replaces the compounding multiply-down — which had floored 98% of the corpus at 0.01 — with an absolute exponential law rw = (salience/10) * exp(-dt/tau) anchored on last_relevant_at, per-level tau (14d/90d/365d, 7d override for tombstones), hard-deletion windows for tombstones >30d and weak fossil leaves, a janitor that tombstones stranded non-user leaves older than 7d, and pure reads (the reconsolidation activation bump and data_graph _touch_accessed ratchet are deleted). Retrieval (TKT-923) drops the adaptive-radius pipeline entirely for a full-KNN vector lane with min-max normalized per-lane scoring and a 50% relative floor: relevance = max(vector, fts), recency = exponential on last_relevant_at (~14d half-life), importance = (salience/10) * retrieval_weight. The consolidation tree is collapsed at read time so leaves/supers/eras compete in one pool; data_graph recall filters to valid_to IS NULL so invalidated facts never resurface; 8 radius constants and the drift apparatus are deleted; recall telemetry now writes floor_cut_count/final_rrf_count/top_distances.
WorldState persistence (TKT-922) was starving the subconscious maintenance gate — 57 ticks vs 3029 skips — because last_user_message_at lived only in an in-memory dict and was wiped on every restart. It now dual-writes through a shared DurableTimestamp (MemoryStore fast path + data_graph system row) on every user message and rehydrates on construction, mirroring the worker’s last_fired clock on the same class. Corrupt stored values trip a parse_utc sentinel and are logged loud rather than decaying into gate comparisons; new integration tests pin restart survival and gate de-starvation.
A ToolResult contract sweep tightened error vocabulary across the abilities layer. Vision (TKT-915) replaces code=“error” with missing-params, not-found, no-file-on-disk, and vision-failed; OCR fallback carries degraded=true meta. Weather (TKT-914) adds a missing-location short-circuit before any cache/fetch lookup, stamps stale=true meta on cached payloads returned when all live sources are down, and narrows both fetchers’ broad except Exception to (RequestException, ValueError, KeyError, IndexError, TypeError) so programming bugs propagate. save_pattern (TKT-913) rejects four validation failures that used to masquerade as success and echoes counts on the happy path. code_eval (TKT-917) returns ok() with branchable {stdout, stderr, exit_code, duration_ms} for anything that actually ran — runtime exceptions become exit_code=1 with traceback on stderr — and reserves the now-deleted code=“error” funnel for harness failures (missing-params, no-output, timeout, sandbox-crashed). review twins (TKT-918) merge onto a shared ReviewWindowAbility base with a 1-based iter ordinal, parse_utc sentinel -> invalid-time, and narrow except sqlite3.Error -> query-failed. Compactors (TKT-919) surface rows_compacted and trail_chars honestly, freezing the no-op envelope byte-identical. Thinking (TKT-916) pins the verbatim prose envelope with the NOTHING sentinel collapsing to an empty body line.
Test infrastructure saw a big cleanup. A shared tests/_tool_result_harness.py owns the byte-identical dispatch plumbing (MP, seed_transcript, allow_policy, head/body/parse_body) that 32 files had each copied privately; genuinely divergent helpers stayed local; pre-existing dead helpers were removed where migration orphaned them. Net -438 lines across 355 tests with zero renamed or reordered. Five nightly endpoint/log scenarios ported into repo feature tests (test_api_chat_endpoints.py new, test_mode_gate.py extended) with evidence for what was deliberately not ported (dead [MODE-GATE-PROMOTE] patterns and LLM-territory ACT-loop tests).
The Policies UI gets channel-wide bulk Allow/Ask/Deny buttons (TKT-913-adjacent) that fan out one PUT /api/policies per changed row with a single summary toast; the orphaned Subagent policy tab is removed because POLICY_CHANNEL defines only chat/subconscious/external_agent and the old subagent tool was replaced by delegate tools inheriting the caller’s policy_channel. README release badge now points at tags instead of releases. Architecture docs accompany each contract migration with verbatim-prose and degraded-fallback sections.
-
Memory redesign: hierarchy levels + last_relevant_at + bi-temporal valid_from/valid_to, with idempotent NULL-guarded backfill
-
Absolute exponential decay replaces compounding multiply-down; hard-deletion windows for tombstones and weak fossils; pure-read recall (no more activation bumps)
-
Collapsed-tree KNN with min-max per-lane scoring, 50% relative floor, and data_graph live-facts filter (valid_to IS NULL); 8 radius constants and drift apparatus deleted
-
last_user_message_at dual-writes through a shared DurableTimestamp so the subconscious gate survives container restarts (57 ticks vs 3029 skips)
-
Shared tests/_tool_result_harness.py replaces 32 private copies of dispatch plumbing — net -438 lines across 355 tests, zero renamed
-
Kebab error codes across vision/weather/save_pattern/code_eval/review abilities; degraded=true meta on OCR fallback and stale weather cache
-
Bulk Allow/Ask/Deny buttons in Policies header fan out per-row PUTs; orphaned Subagent tab removed (no backend channel, no seeded rows)
-
Five nightly endpoint/log scenarios ported into repo feature tests with documented evidence for what was deliberately not ported