luminos

Author	SHA1	Message	Date
Jeff Smith	4ef97c5626	merge: fix/issue-54-write-cache-tool-desc	2026-04-07 14:22:12 -06:00
Jeff Smith	c03f4f7c60	fix(ai): document confidence fields in write_cache tool schema (#54 ) The system prompt already instructs the agent to set confidence/ confidence_reason on every write_cache call, but the tool's data schema description listed only the legacy fields. Add the confidence fields and a one-line calibration pointer so the model sees them when binding the tool, not just in the system prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 14:21:57 -06:00
Jeff Smith	4a847d20aa	chore: update CLAUDE.md for session 7	2026-04-07 14:20:53 -06:00
Jeff Smith	fccbca0ce7	chore: update CLAUDE.md for session 6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 13:48:44 -06:00
Jeff Smith	fc57e33d1f	merge: chore/extract-workflow-to-global	2026-04-07 13:47:41 -06:00
Jeff Smith	b2ead84531	chore: extract workflow sections to global ~/.claude/CLAUDE.md Move Development Workflow, Branching Discipline, Documentation Workflow, ADHD Session Protocols, and Session Protocols out of the project CLAUDE.md and into the global one so all projects share them. Move docs/externalize.md and docs/wrap-up.md to ~/.claude/protocols/ (lightly generalized). Project CLAUDE.md keeps only luminos-specific state, module map, constraints, naming, test command, and session log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 13:47:41 -06:00
Jeff Smith	f63875b448	merge: chore/issue-followups-session5	2026-04-06 23:26:43 -06:00
Jeff Smith	a3b5f6397e	docs(plan): insert session 5 follow-ups #54 , #55 , #56 , #57 into implementation order	2026-04-06 23:26:38 -06:00
Jeff Smith	159ab5207a	chore: update CLAUDE.md for session 5	2026-04-06 23:23:23 -06:00
Jeff Smith	8c0e29b6d8	merge: docs/issue-53-onboarding-internals (#53 )	2026-04-06 23:21:44 -06:00
Jeff Smith	1892784d35	docs: add status snapshot to PLAN.md, fix domain.py file-map (#53 )	2026-04-06 23:21:41 -06:00
Jeff Smith	74477d8c2a	chore(workflow): manually close issues after merge, do not rely on auto-close	2026-04-06 22:58:01 -06:00
Jeff Smith	88ecdb9761	chore: update CLAUDE.md for session 4	2026-04-06 22:52:24 -06:00
Jeff Smith	40af515fb2	merge: feat/issue-44-context-budget (#44 )	2026-04-06 22:49:44 -06:00
Jeff Smith	036c3a934a	fix(ai): correct context budget metric — track per-call, not sum (#44 ) The dir loop was exiting early on small targets (a 13-file Python lib hit the budget at 92k–139k cumulative tokens) because _TokenTracker compared the SUM of input_tokens across all turns to the context window size. input_tokens from each API response is the size of the full prompt sent on that turn (system + every prior message + new tool results), so summing across turns multi-counts everything. The real per-call context size never approached the limit. Verified empirically: on luminos_lib pre-fix, the loop bailed when the most recent call's input_tokens was 20,535 (~10% of Sonnet's 200k window) but the cumulative sum was 134,983. Changes: - _TokenTracker now tracks last_input (the most recent call's input_tokens), separate from the cumulative loop_input/total_input used for cost reporting. - budget_exceeded() returns last_input > CONTEXT_BUDGET, not the cumulative sum. - MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real context window). CONTEXT_BUDGET stays at 70% = 140,000. - Early-exit message now shows context size, threshold, AND cumulative spend separately so future debugging is unambiguous. Smoke test on luminos_lib: investigation completes without early exit (~$0.37). 6 unit tests added covering the new semantics, including the key regression: a sequence of small calls whose sum exceeds the budget must NOT trip the check. Wiki Architecture page updated. #51 filed for the separate message-history-growth issue.	2026-04-06 22:49:25 -06:00
Jeff Smith	157ac3f606	merge: feat/issue-42-classifier-bias (#42 )	2026-04-06 22:36:26 -06:00
Jeff Smith	f3abbce7d4	feat(filetypes): expose raw signals to survey, remove classifier bias (#42 ) The survey pass no longer receives the bucketed file_categories histogram, which was biased toward source-code targets and would mislabel mail, notebooks, ledgers, and other non-code domains as "source" via the file --brief "text" pattern fallback. Adds filetypes.survey_signals(), which assembles raw signals from the same `classified` data the bucketer already processes — no new walks, no new dependencies: total_files — total count extension_histogram — top 20 extensions, raw, no taxonomy file_descriptions — top 20 `file --brief` outputs, by count filename_samples — 20 names, evenly drawn (not first-20) `survey --brief` descriptions are truncated at 80 chars before counting so prefixes group correctly without exploding key cardinality. The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the histogram was biased toward source code) is removed and replaced with neutral guidance on how to read the raw signals together. The {file_type_distribution} placeholder is renamed to {survey_signals} to reflect the broader content. luminos.py base scan computes survey_signals once and stores it on report["survey_signals"]; AI consumers read from there. summarize_categories() and report["file_categories"] are unchanged — the terminal report still uses the bucketed view (#49 tracks fixing that follow-up). Smoke tested on two targets: - luminos_lib: identical-quality survey ("Python library package", confidence 0.85), unchanged behavior on code targets. - A synthetic Maildir of 8 messages with `:2,S` flag suffixes: survey now correctly identifies it as "A Maildir-format mailbox containing 8 email messages" with confidence 0.90, names the Maildir naming convention in domain_notes, and correctly marks parse_structure as a skip tool. Before #42 this would have been "8 source files." Adds 8 unit tests for survey_signals covering empty input, extension histogram, description aggregation/truncation, top-N cap, and even-stride filename sampling. #48 tracks the unit-of-analysis limitation (file is the wrong unit for mbox, SQLite, archives, notebooks) — explicitly out of scope for #42 and documented in survey_signals' docstring.	2026-04-06 22:36:14 -06:00
Jeff Smith	55da7fa8dc	docs(plan): add Phase 4.5 (#48 ) and end-of-project #49 #48 captures the unit-of-analysis problem: "file" is the wrong unit for containers (mbox, SQLite, zip, notebooks) and dense directories (Maildir, .git, node_modules). Sequenced after Phase 4 as its own phase since it requires format detection and container handlers. #49 captures the smaller follow-up that the terminal report still shows the biased bucketed view. Deferred to end-of-project tuning.	2026-04-06 22:31:41 -06:00
Jeff Smith	6cda1cc521	docs(plan): defer #46 to end-of-project tuning section	2026-04-06 22:20:54 -06:00
Jeff Smith	896dac686d	merge: feat/issue-7-survey-min-size (#7 )	2026-04-06 22:19:35 -06:00
Jeff Smith	8fb2f90678	feat(ai): skip survey pass for tiny targets (#7 ) Adds a gate in _run_investigation that skips the survey API call when a target has both fewer than _SURVEY_MIN_FILES (5) files AND fewer than _SURVEY_MIN_DIRS (2) directories. AND semantics handle the deep-narrow edge case correctly: a target with 4 files spread across 50 directories still gets a survey because dir count amortizes the cost across 50 dir loops. When skipped, _default_survey() supplies a synthetic dict with confidence=0.0 — chosen specifically so _filter_dir_tools() never enforces skip_tools from a synthetic value. The dir loop receives a generic "small target, read everything" framing in its prompt and keeps its full toolbox. Reorders _discover_directories() to run before the survey gate so total_dirs is available without a second walk. #46 tracks revisiting the threshold values with empirical data after Phase 2 ships and we've run --ai on a variety of real targets. Smoke tested on a 2-file target: gate triggers, default survey substituted, dir loop completes normally. Adds 4 unit tests for _default_survey() covering schema, confidence guard, filter interaction, and empty skip_tools.	2026-04-06 22:19:25 -06:00
Jeff Smith	b2d00dd301	merge: feat/issue-6-wire-survey (#6 )	2026-04-06 22:07:22 -06:00
Jeff Smith	2e3d21f774	feat(ai): wire survey output into dir loop (#6 ) The survey pass now actually steers dir loop behavior, in two ways: 1. Prompt injection: a new {survey_context} placeholder in _DIR_SYSTEM_PROMPT receives the survey description, approach, domain_notes, relevant_tools, and skip_tools so the dir-loop agent has investigation context before its first turn. 2. Tool schema filtering: _filter_dir_tools() removes any tool listed in skip_tools from the schema passed to the API, gated on survey confidence >= 0.5. Control-flow tools (submit_report) are always preserved. This is hard enforcement — the agent literally cannot call a filtered tool, which the smoke test for #5 showed was necessary (prompt-only guidance was ignored). Smoke test on luminos_lib: zero run_command invocations (vs 2 before), context budget no longer exhausted (87k vs 133k), cost ~$0.34 (vs $0.46), investigation completes instead of early-exiting. Adds tests/test_ai_filter.py with 14 tests covering _filter_dir_tools and _format_survey_block — both pure helpers, no live API needed.	2026-04-06 22:07:12 -06:00
Jeff Smith	e942ecc34a	docs(plan): add Phase 2.5 context budget reliability (#44 ) #5 smoke test showed the dir loop exhausts the 126k context budget on a 13-file Python lib. Sequencing #44 between Phase 2 and Phase 3 so the foundation is solid before planning + external tools add more prompt and tool weight.	2026-04-06 21:59:01 -06:00
Jeff Smith	ffd9d9e929	merge: feat/issue-5-run-survey (#5 )	2026-04-06 21:50:08 -06:00
Jeff Smith	fecb24d6e1	feat(ai): add _run_survey() and submit_survey tool (#5 ) Adds the reconnaissance survey pass: a fast, ≤3-turn LLM call that characterizes the target before any directory investigation begins. The survey receives the file-type distribution (from the base scan), a top-2-level tree preview, and the list of available dir-loop tools, and returns description / approach / relevant_tools / skip_tools / domain_notes / confidence via a single submit_survey tool call. Wired into _run_investigation() before the directory loop. Output is logged but not yet consumed — that wiring is #6. Survey failure is non-fatal: if the call errors or runs out of turns, the investigation proceeds without survey context. Also adds a Band-Aid to _SURVEY_SYSTEM_PROMPT warning the LLM that the file-type histogram is biased toward source code (the underlying classifier has no concept of mail, notebooks, ledgers, etc.) and to trust the tree preview when they conflict. The proper fix is #42.	2026-04-06 21:49:59 -06:00
Jeff Smith	05fcaac755	docs(plan): note classifier rebuild (#42 ) in Phase 2 The filetype classifier is biased toward source code and would mislead the survey pass on non-code targets (mail, notebooks, ledgers). #5 ships with a prompt-level Band-Aid; #42 captures the real fix and is sequenced after the survey pass is observable end-to-end and before Phase 3 depends on survey output.	2026-04-06 21:47:49 -06:00
Jeff Smith	2afef76a67	merge: feat/issue-4-survey-prompt (#4 )	2026-04-06 21:35:29 -06:00
Jeff Smith	987f41ec2e	feat(prompts): add _SURVEY_SYSTEM_PROMPT for survey pass (#4 ) Adds the system prompt for the survey reconnaissance pass. The survey agent answers three questions (what is this, what approach, which tools matter) from cheap signals — file type distribution and a top-2-level tree — without reading files. Tool triage is tri-state: relevant, skip, or unlisted (default), so skip is reserved for tools whose use would be actively wrong rather than merely unnecessary. Wiring of _run_survey() and the submit_survey tool follows in #5.	2026-04-06 21:35:17 -06:00
Jeff Smith	0a9afc96c9	chore: update CLAUDE.md for session 3	2026-04-06 21:15:27 -06:00
Jeff Smith	09e5686bea	merge: feat/issue-3-low-confidence-entries (#3 )	2026-04-06 21:13:58 -06:00
Jeff Smith	1d681c8bc1	feat(cache): add low_confidence_entries() query to CacheManager (#3 ) Returns all file and dir cache entries with confidence below a given threshold (default 0.7). Entries missing a confidence field are included as unrated/untrusted. Results sorted ascending by confidence so least-confident entries come first. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 21:13:58 -06:00
Jeff Smith	a67e4789b2	merge: feat/issue-2-confidence-prompt (#2 )	2026-04-06 20:46:08 -06:00
Jeff Smith	80f8f883c1	feat(prompts): instruct agent to set confidence on cache writes (#2 ) Add confidence and confidence_reason to both cache schemas in the dir loop prompt. Add a Confidence section with categorical guidance (high ≥ 0.8, medium 0.5–0.8, low < 0.5) and the rule to include confidence_reason when confidence is below 0.7. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 20:46:05 -06:00
Jeff Smith	4338587360	merge: feat/issue-37-unit-tests (#37 )	2026-04-06 16:57:30 -06:00
Jeff Smith	6875cf5ed1	feat(tests): add unit test coverage for all testable modules (#37 ) 129 tests across cache, filetypes, code, disk, recency, tree, report, and capabilities. Uses stdlib unittest only — no new dependencies. Also updates CLAUDE.md development workflow to require test coverage for all future changes. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:57:26 -06:00
Jeff Smith	0d9812490a	merge: feat/issue-1-confidence-fields (#1 )	2026-04-06 16:51:58 -06:00
Jeff Smith	b158809c19	feat(cache): add confidence fields to file and dir cache schemas (#1 ) Add optional confidence (float 0.0–1.0) and confidence_reason (str) fields to both file and dir cache entries. Validation rejects out-of-range values and wrong types. Fields are not yet required — pure schema instrumentation for Phase 1. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 16:50:48 -06:00
Jeff Smith	c7bacb674f	chore: update CLAUDE.md for session 2	2026-04-06 16:43:25 -06:00
Jeff Smith	309e3464bc	chore: add dev workflow, branching discipline, ADHD protocols, session docs	2026-04-06 16:20:51 -06:00
Jeff Smith	a2b015a965	chore: ignore docs/wiki/ — separate git repo	2026-04-06 16:13:31 -06:00
Jeff Smith	77d4be0221	chore: rewrite CLAUDE.md — thin format, wiki references, dev practices	2026-04-06 16:13:20 -06:00
Jeff Smith	d323190866	merge: add -x/--exclude flag for directory exclusion	2026-04-06 14:32:17 -06:00
Jeff Smith	78f9a396dd	feat: add -x/--exclude flag to exclude directories from scan and AI analysis	2026-04-06 14:32:12 -06:00
Jeff Smith	78f80c31ed	merge: in-place per-file progress for scan steps	2026-04-06 14:26:40 -06:00
Jeff Smith	206d2d34f6	feat: in-place per-file progress for classify, count, and large-file steps	2026-04-06 14:26:37 -06:00
Jeff Smith	bbaf387cb7	merge: add progress output to base scan steps	2026-04-06 14:21:19 -06:00
Jeff Smith	ebc6b852f1	feat: add progress output to base scan steps	2026-04-06 14:21:17 -06:00
Jeff Smith	33df555a8c	merge: extract system prompts module	2026-03-30 14:44:57 -06:00
Jeff Smith	ea8c07a692	refactor: extract system prompts into luminos_lib/prompts.py Moves _DIR_SYSTEM_PROMPT and _SYNTHESIS_SYSTEM_PROMPT from ai.py into a dedicated prompts module. Both are pure template strings with .format() placeholders — no runtime imports needed in prompts.py. Prompt content is byte-for-byte identical to the original. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-03-30 14:44:45 -06:00

1 2

74 commits