Research Pipeline
The Problem
Development projects benefit from external knowledge -- a library that solves an open problem, an article explaining a pattern you are about to implement, a tool that replaces a planned build. But finding these resources requires active search at the right moment, and developers rarely have time to scan the horizon while building. The knowledge exists; nobody is connecting it to the work.
Sidespace automates this with Umbra, a research agent that continuously scans external sources for project-relevant resources and surfaces them through a quality gate before they reach the knowledge base.
What's Built
Umbra: The Research Agent
Umbra runs on Claude Haiku -- a fast, cheap model suited for broad scanning. For each active project, Umbra searches two sources:
- MollyMemo -- an external knowledge base where the user saves links, articles, and resources via a Telegram bot. MollyMemo is a fully independent product; Umbra reads from its database but does not modify it.
- Web search -- when MollyMemo does not have enough relevant matches, Umbra searches the web to fill gaps.
Umbra evaluates each candidate against the project's current state (tech stack, open tasks, recent activity) and assigns a confidence score with written reasoning. The confidence threshold is set to 0.3 -- intentionally wide to cast a broad net.
The Staging Pattern
Nothing Umbra finds goes directly into the knowledge base. All findings land in a staging table (umbra_findings) with a status of pending. This is a deliberate design choice: automated discovery is noisy, and injecting low-quality results into the memory system would pollute every future agent session.
Instead, Hoshi reviews the findings as a second quality gate.
The Review Chain
The pipeline runs as a sequence of Supabase edge functions:
umbra-turn (Haiku)
Scans sources, writes findings to umbra_findings
|
chain_hoshi_review (pg_net RPC, fire-and-forget)
|
hoshi-review (Gemini 3.1 Pro or Sonnet fallback)
Evaluates pending findings, decides outcomehoshi-review uses the same Hoshi brain adapter as the main chat experience -- Gemini 3.1 Pro as primary engine with Legacy Rust+Sonnet as fallback. For each finding, it chooses one of three outcomes:
| Decision | What Happens |
|---|---|
| Promote | Marked promoted in staging. A relevance tag is written back to MollyMemo so users see the project association in its webapp. |
| Reject | Marked rejected. Not surfaced to the user. |
| Escalate | Flagged for human review. Appears in the Hub feed with interactive approve/reject controls. |
Each pipeline run produces a single consolidated research_digest event in the Hub feed, showing counts and expandable detail for each category.
Cost Efficiency
The pipeline uses tiered model selection to keep costs low:
| Step | Model | Cost per Run |
|---|---|---|
| Discovery and evaluation | Haiku | ~$0.02 |
| Quality review | Gemini 3.1 Pro or Sonnet | ~$0.08 |
| Total per pipeline run | ~$0.10 |
Research frequency adapts to project activity. Active projects (commits in the last 7 days) get nightly runs. Idle projects run weekly. Dormant projects run monthly. A global daily cap of $5 prevents runaway costs.
MollyMemo Integration
MollyMemo and Sidespace are fully decoupled systems. Umbra reads from MollyMemo's database as one source among others. If MollyMemo is down, Umbra still works with fewer sources. The integration is bidirectional but lightweight: Sidespace pushes project anchors to MollyMemo and writes relevance tags back after promotion. No code is shared. No databases are merged.
Where It's Heading
Umbra findings accumulate over time. If a project is idle for weeks, findings build up into a backlog. When the user returns to that project, the reviewing agent has weeks of accumulated intelligence to evaluate -- not just the last run. Future work includes Umbra learning from its promotion/rejection ratios to adjust search strategy per project.