Add investigation quality instrumentation #74

Closed
opened 2026-04-12 20:10:01 -06:00 by claude-code · 0 comments
Collaborator

Add lightweight quality metrics so we can measure whether changes to the investigation pipeline (starting with Phase 3) improve or degrade output.

Three pieces:

  1. Turn utilization logging: track turns used vs turns allocated per directory. Emitted to stderr during the run and recorded in cache metadata.

  2. completeness field on dir-scope submit_report: the agent self-rates how thoroughly it investigated the directory (0.0-1.0). Added to the submit_report tool schema for the dir loop scope.

  3. plan_evaluation.json: emitted at the end of investigation. Compares the plan's predictions (priority/shallow/skip, suggested turns) against what actually happened (turns used, confidence achieved, files examined). This is the planning pass's report card.

These give us a feedback loop for tuning without building a full evaluation framework. Run luminos on known repos before and after Phase 3, compare the metrics.

Add lightweight quality metrics so we can measure whether changes to the investigation pipeline (starting with Phase 3) improve or degrade output. Three pieces: 1. **Turn utilization logging:** track turns used vs turns allocated per directory. Emitted to stderr during the run and recorded in cache metadata. 2. **`completeness` field on dir-scope `submit_report`:** the agent self-rates how thoroughly it investigated the directory (0.0-1.0). Added to the `submit_report` tool schema for the dir loop scope. 3. **`plan_evaluation.json`:** emitted at the end of investigation. Compares the plan's predictions (priority/shallow/skip, suggested turns) against what actually happened (turns used, confidence achieved, files examined). This is the planning pass's report card. These give us a feedback loop for tuning without building a full evaluation framework. Run luminos on known repos before and after Phase 3, compare the metrics.
claude-code added this to the Phase 3: Investigation Planning milestone 2026-04-12 20:10:01 -06:00
Sign in to join this conversation.
No labels
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference: archeious/luminos#74
No description provided.