Commit graph

71 commits

Author SHA1 Message Date
Jeff Smith
fccbca0ce7 chore: update CLAUDE.md for session 6
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:48:44 -06:00
Jeff Smith
fc57e33d1f merge: chore/extract-workflow-to-global 2026-04-07 13:47:41 -06:00
Jeff Smith
b2ead84531 chore: extract workflow sections to global ~/.claude/CLAUDE.md
Move Development Workflow, Branching Discipline, Documentation Workflow,
ADHD Session Protocols, and Session Protocols out of the project CLAUDE.md
and into the global one so all projects share them. Move docs/externalize.md
and docs/wrap-up.md to ~/.claude/protocols/ (lightly generalized). Project
CLAUDE.md keeps only luminos-specific state, module map, constraints,
naming, test command, and session log.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 13:47:41 -06:00
Jeff Smith
f63875b448 merge: chore/issue-followups-session5 2026-04-06 23:26:43 -06:00
Jeff Smith
a3b5f6397e docs(plan): insert session 5 follow-ups #54, #55, #56, #57 into implementation order 2026-04-06 23:26:38 -06:00
Jeff Smith
159ab5207a chore: update CLAUDE.md for session 5 2026-04-06 23:23:23 -06:00
Jeff Smith
8c0e29b6d8 merge: docs/issue-53-onboarding-internals (#53) 2026-04-06 23:21:44 -06:00
Jeff Smith
1892784d35 docs: add status snapshot to PLAN.md, fix domain.py file-map (#53) 2026-04-06 23:21:41 -06:00
Jeff Smith
74477d8c2a chore(workflow): manually close issues after merge, do not rely on auto-close 2026-04-06 22:58:01 -06:00
Jeff Smith
88ecdb9761 chore: update CLAUDE.md for session 4 2026-04-06 22:52:24 -06:00
Jeff Smith
40af515fb2 merge: feat/issue-44-context-budget (#44) 2026-04-06 22:49:44 -06:00
Jeff Smith
036c3a934a fix(ai): correct context budget metric — track per-call, not sum (#44)
The dir loop was exiting early on small targets (a 13-file Python lib
hit the budget at 92k–139k cumulative tokens) because _TokenTracker
compared the SUM of input_tokens across all turns to the context
window size. input_tokens from each API response is the size of the
full prompt sent on that turn (system + every prior message + new
tool results), so summing across turns multi-counts everything. The
real per-call context size never approached the limit.

Verified empirically: on luminos_lib pre-fix, the loop bailed when
the most recent call's input_tokens was 20,535 (~10% of Sonnet's
200k window) but the cumulative sum was 134,983.

Changes:
- _TokenTracker now tracks last_input (the most recent call's
  input_tokens), separate from the cumulative loop_input/total_input
  used for cost reporting.
- budget_exceeded() returns last_input > CONTEXT_BUDGET, not the
  cumulative sum.
- MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real
  context window). CONTEXT_BUDGET stays at 70% = 140,000.
- Early-exit message now shows context size, threshold, AND
  cumulative spend separately so future debugging is unambiguous.

Smoke test on luminos_lib: investigation completes without early
exit (~$0.37). 6 unit tests added covering the new semantics,
including the key regression: a sequence of small calls whose sum
exceeds the budget must NOT trip the check.

Wiki Architecture page updated.

#51 filed for the separate message-history-growth issue.
2026-04-06 22:49:25 -06:00
Jeff Smith
157ac3f606 merge: feat/issue-42-classifier-bias (#42) 2026-04-06 22:36:26 -06:00
Jeff Smith
f3abbce7d4 feat(filetypes): expose raw signals to survey, remove classifier bias (#42)
The survey pass no longer receives the bucketed file_categories
histogram, which was biased toward source-code targets and would
mislabel mail, notebooks, ledgers, and other non-code domains as
"source" via the file --brief "text" pattern fallback.

Adds filetypes.survey_signals(), which assembles raw signals from
the same `classified` data the bucketer already processes — no new
walks, no new dependencies:
  total_files       — total count
  extension_histogram — top 20 extensions, raw, no taxonomy
  file_descriptions   — top 20 `file --brief` outputs, by count
  filename_samples    — 20 names, evenly drawn (not first-20)

`survey --brief` descriptions are truncated at 80 chars before
counting so prefixes group correctly without exploding key cardinality.

The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the
histogram was biased toward source code) is removed and replaced
with neutral guidance on how to read the raw signals together.
The {file_type_distribution} placeholder is renamed to
{survey_signals} to reflect the broader content.

luminos.py base scan computes survey_signals once and stores it on
report["survey_signals"]; AI consumers read from there.

summarize_categories() and report["file_categories"] are unchanged
— the terminal report still uses the bucketed view (#49 tracks
fixing that follow-up).

Smoke tested on two targets:
- luminos_lib: identical-quality survey ("Python library package",
  confidence 0.85), unchanged behavior on code targets.
- A synthetic Maildir of 8 messages with `:2,S` flag suffixes:
  survey now correctly identifies it as "A Maildir-format mailbox
  containing 8 email messages" with confidence 0.90, names the
  Maildir naming convention in domain_notes, and correctly marks
  parse_structure as a skip tool. Before #42 this would have been
  "8 source files."

Adds 8 unit tests for survey_signals covering empty input, extension
histogram, description aggregation/truncation, top-N cap, and
even-stride filename sampling.

#48 tracks the unit-of-analysis limitation (file is the wrong unit
for mbox, SQLite, archives, notebooks) — explicitly out of scope
for #42 and documented in survey_signals' docstring.
2026-04-06 22:36:14 -06:00
Jeff Smith
55da7fa8dc docs(plan): add Phase 4.5 (#48) and end-of-project #49
#48 captures the unit-of-analysis problem: "file" is the wrong unit
for containers (mbox, SQLite, zip, notebooks) and dense directories
(Maildir, .git, node_modules). Sequenced after Phase 4 as its own
phase since it requires format detection and container handlers.

#49 captures the smaller follow-up that the terminal report still
shows the biased bucketed view. Deferred to end-of-project tuning.
2026-04-06 22:31:41 -06:00
Jeff Smith
6cda1cc521 docs(plan): defer #46 to end-of-project tuning section 2026-04-06 22:20:54 -06:00
Jeff Smith
896dac686d merge: feat/issue-7-survey-min-size (#7) 2026-04-06 22:19:35 -06:00
Jeff Smith
8fb2f90678 feat(ai): skip survey pass for tiny targets (#7)
Adds a gate in _run_investigation that skips the survey API call when
a target has both fewer than _SURVEY_MIN_FILES (5) files AND fewer
than _SURVEY_MIN_DIRS (2) directories. AND semantics handle the
deep-narrow edge case correctly: a target with 4 files spread across
50 directories still gets a survey because dir count amortizes the
cost across 50 dir loops.

When skipped, _default_survey() supplies a synthetic dict with
confidence=0.0 — chosen specifically so _filter_dir_tools() never
enforces skip_tools from a synthetic value. The dir loop receives
a generic "small target, read everything" framing in its prompt and
keeps its full toolbox.

Reorders _discover_directories() to run before the survey gate so
total_dirs is available without a second walk.

#46 tracks revisiting the threshold values with empirical data after
Phase 2 ships and we've run --ai on a variety of real targets.

Smoke tested on a 2-file target: gate triggers, default survey
substituted, dir loop completes normally. Adds 4 unit tests for
_default_survey() covering schema, confidence guard, filter
interaction, and empty skip_tools.
2026-04-06 22:19:25 -06:00
Jeff Smith
b2d00dd301 merge: feat/issue-6-wire-survey (#6) 2026-04-06 22:07:22 -06:00
Jeff Smith
2e3d21f774 feat(ai): wire survey output into dir loop (#6)
The survey pass now actually steers dir loop behavior, in two ways:

1. Prompt injection: a new {survey_context} placeholder in
   _DIR_SYSTEM_PROMPT receives the survey description, approach,
   domain_notes, relevant_tools, and skip_tools so the dir-loop agent
   has investigation context before its first turn.

2. Tool schema filtering: _filter_dir_tools() removes any tool listed
   in skip_tools from the schema passed to the API, gated on
   survey confidence >= 0.5. Control-flow tools (submit_report) are
   always preserved. This is hard enforcement — the agent literally
   cannot call a filtered tool, which the smoke test for #5 showed
   was necessary (prompt-only guidance was ignored).

Smoke test on luminos_lib: zero run_command invocations (vs 2 before),
context budget no longer exhausted (87k vs 133k), cost ~$0.34 (vs
$0.46), investigation completes instead of early-exiting.

Adds tests/test_ai_filter.py with 14 tests covering _filter_dir_tools
and _format_survey_block — both pure helpers, no live API needed.
2026-04-06 22:07:12 -06:00
Jeff Smith
e942ecc34a docs(plan): add Phase 2.5 context budget reliability (#44)
#5 smoke test showed the dir loop exhausts the 126k context budget on
a 13-file Python lib. Sequencing #44 between Phase 2 and Phase 3 so
the foundation is solid before planning + external tools add more
prompt and tool weight.
2026-04-06 21:59:01 -06:00
Jeff Smith
ffd9d9e929 merge: feat/issue-5-run-survey (#5) 2026-04-06 21:50:08 -06:00
Jeff Smith
fecb24d6e1 feat(ai): add _run_survey() and submit_survey tool (#5)
Adds the reconnaissance survey pass: a fast, ≤3-turn LLM call that
characterizes the target before any directory investigation begins.
The survey receives the file-type distribution (from the base scan),
a top-2-level tree preview, and the list of available dir-loop tools,
and returns description / approach / relevant_tools / skip_tools /
domain_notes / confidence via a single submit_survey tool call.

Wired into _run_investigation() before the directory loop. Output is
logged but not yet consumed — that wiring is #6. Survey failure is
non-fatal: if the call errors or runs out of turns, the investigation
proceeds without survey context.

Also adds a Band-Aid to _SURVEY_SYSTEM_PROMPT warning the LLM that
the file-type histogram is biased toward source code (the underlying
classifier has no concept of mail, notebooks, ledgers, etc.) and to
trust the tree preview when they conflict. The proper fix is #42.
2026-04-06 21:49:59 -06:00
Jeff Smith
05fcaac755 docs(plan): note classifier rebuild (#42) in Phase 2
The filetype classifier is biased toward source code and would mislead
the survey pass on non-code targets (mail, notebooks, ledgers). #5
ships with a prompt-level Band-Aid; #42 captures the real fix and is
sequenced after the survey pass is observable end-to-end and before
Phase 3 depends on survey output.
2026-04-06 21:47:49 -06:00
Jeff Smith
2afef76a67 merge: feat/issue-4-survey-prompt (#4) 2026-04-06 21:35:29 -06:00
Jeff Smith
987f41ec2e feat(prompts): add _SURVEY_SYSTEM_PROMPT for survey pass (#4)
Adds the system prompt for the survey reconnaissance pass. The survey
agent answers three questions (what is this, what approach, which tools
matter) from cheap signals — file type distribution and a top-2-level
tree — without reading files. Tool triage is tri-state: relevant, skip,
or unlisted (default), so skip is reserved for tools whose use would be
actively wrong rather than merely unnecessary.

Wiring of _run_survey() and the submit_survey tool follows in #5.
2026-04-06 21:35:17 -06:00
Jeff Smith
0a9afc96c9 chore: update CLAUDE.md for session 3 2026-04-06 21:15:27 -06:00
Jeff Smith
09e5686bea merge: feat/issue-3-low-confidence-entries (#3) 2026-04-06 21:13:58 -06:00
Jeff Smith
1d681c8bc1 feat(cache): add low_confidence_entries() query to CacheManager (#3)
Returns all file and dir cache entries with confidence below a given
threshold (default 0.7). Entries missing a confidence field are
included as unrated/untrusted. Results sorted ascending by confidence
so least-confident entries come first.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 21:13:58 -06:00
Jeff Smith
a67e4789b2 merge: feat/issue-2-confidence-prompt (#2) 2026-04-06 20:46:08 -06:00
Jeff Smith
80f8f883c1 feat(prompts): instruct agent to set confidence on cache writes (#2)
Add confidence and confidence_reason to both cache schemas in the dir
loop prompt. Add a Confidence section with categorical guidance
(high ≥ 0.8, medium 0.5–0.8, low < 0.5) and the rule to include
confidence_reason when confidence is below 0.7.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 20:46:05 -06:00
Jeff Smith
4338587360 merge: feat/issue-37-unit-tests (#37) 2026-04-06 16:57:30 -06:00
Jeff Smith
6875cf5ed1 feat(tests): add unit test coverage for all testable modules (#37)
129 tests across cache, filetypes, code, disk, recency, tree, report,
and capabilities. Uses stdlib unittest only — no new dependencies.
Also updates CLAUDE.md development workflow to require test coverage
for all future changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:57:26 -06:00
Jeff Smith
0d9812490a merge: feat/issue-1-confidence-fields (#1) 2026-04-06 16:51:58 -06:00
Jeff Smith
b158809c19 feat(cache): add confidence fields to file and dir cache schemas (#1)
Add optional confidence (float 0.0–1.0) and confidence_reason (str) fields
to both file and dir cache entries. Validation rejects out-of-range values
and wrong types. Fields are not yet required — pure schema instrumentation
for Phase 1.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-06 16:50:48 -06:00
Jeff Smith
c7bacb674f chore: update CLAUDE.md for session 2 2026-04-06 16:43:25 -06:00
Jeff Smith
309e3464bc chore: add dev workflow, branching discipline, ADHD protocols, session docs 2026-04-06 16:20:51 -06:00
Jeff Smith
a2b015a965 chore: ignore docs/wiki/ — separate git repo 2026-04-06 16:13:31 -06:00
Jeff Smith
77d4be0221 chore: rewrite CLAUDE.md — thin format, wiki references, dev practices 2026-04-06 16:13:20 -06:00
Jeff Smith
d323190866 merge: add -x/--exclude flag for directory exclusion 2026-04-06 14:32:17 -06:00
Jeff Smith
78f9a396dd feat: add -x/--exclude flag to exclude directories from scan and AI analysis 2026-04-06 14:32:12 -06:00
Jeff Smith
78f80c31ed merge: in-place per-file progress for scan steps 2026-04-06 14:26:40 -06:00
Jeff Smith
206d2d34f6 feat: in-place per-file progress for classify, count, and large-file steps 2026-04-06 14:26:37 -06:00
Jeff Smith
bbaf387cb7 merge: add progress output to base scan steps 2026-04-06 14:21:19 -06:00
Jeff Smith
ebc6b852f1 feat: add progress output to base scan steps 2026-04-06 14:21:17 -06:00
Jeff Smith
33df555a8c merge: extract system prompts module 2026-03-30 14:44:57 -06:00
Jeff Smith
ea8c07a692 refactor: extract system prompts into luminos_lib/prompts.py
Moves _DIR_SYSTEM_PROMPT and _SYNTHESIS_SYSTEM_PROMPT from ai.py into
a dedicated prompts module. Both are pure template strings with .format()
placeholders — no runtime imports needed in prompts.py. Prompt content
is byte-for-byte identical to the original.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:44:45 -06:00
Jeff Smith
5c6124a715 merge: extract AST parser module 2026-03-30 14:34:06 -06:00
Jeff Smith
0c49da23ab refactor: extract AST parsing into luminos_lib/ast_parser.py
Moves all tree-sitter parsing logic from ai.py into a dedicated module.
Replaces the if/elif language chain with a _LANGUAGE_HANDLERS registry
mapping language names to handler functions.

Extracted: _tool_parse_structure body, _get_ts_parser, _child_by_type,
_text, and all per-language helpers (_py_func_sig, _py_class, etc.).
ai.py retains a thin wrapper for path validation.

Public API: parse_structure(path) -> JSON string

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:34:02 -06:00
Jeff Smith
8aa6c713db merge: post-cache-extraction cleanup 2026-03-30 13:52:43 -06:00