compaction rewritten compact-first; thinking becomes internal ability

The dominant thread is a full redesign of the compaction system. The trim-then-compact pivot (799c697) and single _compact() loop (4d2b9fa) were soon replaced by the canonical COMPACT-FIRST design (68dd5b7): every ACT iteration measures the FULL request, and if it would exceed the window cap, compaction fires BEFORE sending, advances the watermark, and rebuilds the collapsed request. providers.send(force=False) returns OVER_CAP for irreducible requests; force=True bypasses so the provider is the source of truth and fails loud instead of hanging. Several follow-ups tighten the model: honest trail-progress reporting so the OVER_CAP loop can no longer spin (1151ec0), and a single source of truth COMPACTION_ROW_WINDOW=50 for the compaction window after a missing migration reset watermarks to 0 and overflowed the window on a live DB (2651e5f). Compaction summaries moved to durable transcript role=‘compaction’ rows whose id IS the watermark; role=‘compaction’ is now filtered out of chat view, episode extraction, and observability stats (8cf8f07, 80ecf60), the

is extracted before persisting rather than leaking raw scratchpad (1fccaf0), and a watermark-bounded SQL LIMIT bug that silently truncated history to 20 rows is removed (8963833). Compaction now dispatches as two internal never-discoverable abilities (chat_history_compactor, tool_chain_compactor) through the normal ToolDispatcher chokepoint, restoring act-trail WS events, the Brain Compacted-Summary panel, and correct response routing (6860443). TrailHandoverConfig carries the act-trail handover (7250d09); auto-memory-recall seed is excluded from trail compaction; tests and architecture docs are reconciled across Tasks 6.1/6.2 and TKT-832.

Thinking is rewritten as a first-class internal never-discoverable ability (abilities/thinking.py, eaaaf29), dispatched at turn 0 through ToolDispatcher when the high-thinking gate fires. A latent dead-wire is fixed: _run_thinking_gate wrote self._thinking_level while _seed_turn_zero and Providers.send read self.thinking_level, so high-thinking dispatch never fired (bbd7371). The ThreadPool exploration hack, _thinking_exploration attributes, and the ## Chain of Thought block in user.py are all deleted; abilities.sqlite rebuilds to 35 indexed with thinking excluded, and the SHA sidecar is corrected from 36 → 35 entries (bf8d583).

find_tools gets the v2 split (fe3469f, TKT-830): select does exact case-insensitive matches against the effective allow-list with no cap; query runs hybrid vec+FTS RRF with a MIN_RRF_SCORE=0.075 floor capped at top 5; both paths emit a single uniform JSON array of {name, input_schema}. Per-channel discovery is also rewired (10af6a6, TKT-835): FindToolsAbility sources allow/block lists from the invoking processor’s config (discoverable minus blocked), so delegate-exclusive tools like browser/search are no longer discoverable on user or DMN channels. Dead code: store_batch and TestStoreBatch (285e7b8).

The composer gains two controls under the input dock (4fd9923): a thinking-level override (auto/medium/high) persisted via SettingsService with precedence config-hard-pin > user-override > gate-level, and a context-size indicator fed by a new GET /system/context-usage, trailing-coalesced on every inbound WS message. The context readout gets a ‘Context’ label mirroring the ‘Thinking’ caption (28ba3e6). MCP policy rows in the Policy Manager are now grouped by server title and humanized via McpClientService.label_mcp_permissions (fe242af).

Smaller cleanups: the Providers gateway drops redundant mp/_job params from _resolve and _log_after_call now that it owns self.mp, deleting a dead count_tokens and the spurious self.mp-is-None branch (8c4a4de, net -21 SLOC). OllamaService drops the stale 64k :cloud context clamp — verified against grck.lan:30068 that the proxy accepts the full window, so backfill moves Ollama max_tokens 65536 → 200000 (2a72aca). docs/superpowers/ is untracked from both trees and internal ticket/spec references are stripped from public docs (64244cc).

Compaction rewritten COMPACT-FIRST: every ACT measures the FULL request, compaction fires before send when over cap, OVER_CAP sentinel drives the loop, force=True bypasses so provider fails loud on irreducible requests
Compaction summaries now live as durable transcript role=‘compaction’ rows whose id is the watermark; filtered out of chat view, episode extraction, and observability stats; only the extracted
persists, not raw
Compaction dispatches as two internal never-discoverable abilities (chat_history_compactor, tool_chain_compactor) via ToolDispatcher, restoring act-trail WS events, Brain observability, and correct response routing
Thinking is an internal never-discoverable ability dispatched at turn 0; fixed dead-wire where _run_thinking_gate wrote _thinking_level but readers read thinking_level; abilities.sqlite rebuilt to 35 indexed (thinking excluded)
find_tools v2 splits into select (exact, no cap) and query (hybrid RRF with 0.075 floor, top 5) with a uniform JSON array result; per-channel discovery now reads config.discoverable minus config.blocked so delegate-exclusive tools are blocked on user/DMN
Composer gains a persisted thinking-level override (config hard-pin > user override > gate) and a context-size indicator fed by GET /system/context-usage trailing-coalesced on WS messages; Ollama 64k :cloud context clamp dropped after live verification