luminos

Author	SHA1	Message	Date
claude-code	bd5830304b	Merge pull request 'chore: update CLAUDE.md for session 9 wrap-up' (#73 ) from chore/session-9-wrapup into main	2026-04-11 17:04:01 +00:00
Jeff Smith	36ae95b21b	chore: update CLAUDE.md for session 9 wrap-up	2026-04-11 11:02:49 -06:00
claude-code	ef34a83f70	Merge pull request 'test(ai): cover _TokenTracker, _synthesize_from_cache, _discover_directories (#70 )' (#71 ) from test/issue-70-tracker-synth-discover into main	2026-04-11 16:41:50 +00:00
Jeff Smith	efaa2024d7	test(ai): cover _TokenTracker, _synthesize_from_cache, _discover_directories (#70 ) Second wave of pre-Phase-3 test coverage. The #55 round picked off the easy decision-logic helpers; this round covers the three highest-impact helpers that escaped the first sweep. Three new test classes appended to tests/test_ai_pure.py: - TestTokenTracker (11 tests) Pins the load-bearing #44 fix: budget_exceeded() must use last_input (the most recent call's context size) NOT cumulative input, because each turn's input_tokens already includes the full message history. Tests assert: cumulative-input far above budget does NOT trip the gate when last_input stays small; reset_loop() preserves grand totals; the boundary is strict > not >=. - TestSynthesizeFromCache (5 tests) The synthesis fallback fires only when _run_synthesis exhausts its max_turns, which almost never happens in normal runs — exactly the kind of code that silently rots. Tests assert: empty cache returns the incomplete-message brief and empty detailed; single dir entry produces a markdown line; multi-entry detailed contains all entries; empty-summary entries are skipped; file entries alone do not satisfy (the function reads dir entries only). - TestDiscoverDirectories (9 tests) The leaves-first walk drives the entire dir-loop iteration order and is the foundation of the cache reuse story. Tests assert: empty target returns target only; nested trees come back leaves- first; .git / __pycache__ / node_modules / *.egg-info excluded; custom --exclude honored; hidden dirs excluded by default; show_ hidden=True includes them but does not override the skip list. PLAN.md: added Phase 2.7 (#56 ✅) and Phase 2.8 (#55 ✅, #70) entries to the implementation order, and removed the now-stale Phase 3.4 (#56) and Background chore (#55) sections that were displaced by the pre-Phase-3 cleanup pattern. Verification: 234 tests pass (209 prior + 25 new).	2026-04-11 10:41:16 -06:00
claude-code	e9b40e00e0	Merge pull request 'test(ai): add unit coverage for pure helpers in ai.py (#55 )' (#69 ) from test/issue-55-ai-pure-helpers into main	2026-04-11 16:25:15 +00:00
Jeff Smith	a6333858ee	test(ai): add unit coverage for pure helpers in ai.py (#55 ) ai.py was documented as fully exempt from unit testing because the dir loop and synthesis pass require a live Anthropic API. But several helpers in the module are pure functions with no API dependency, and they're the kind of thing that breaks silently. The #57 refactor added two more (_build_dir_loop_context, _flush_partial_dir_entry) that are also naturally testable. New tests/test_ai_pure.py — 45 tests across 8 helpers: - _should_skip_dir: exact-match, *.egg-info glob, no-match cases - _path_is_safe: inside, nested, equals, outside, traversal, sibling-with-target-prefix (the easy-to-miss security case) - _default_survey: shape, zero confidence guarantees no filtering, passes through _filter_dir_tools unchanged - _format_survey_block: None, empty, minimal, with relevant_tools, with skip_tools, with domain_notes, empty-list omission - _filter_dir_tools: None, empty, low confidence, high confidence filters, protected tools never removed, unknown skip silently ignored, garbage/None confidence treated as zero, threshold boundary inclusive - _format_survey_signals: None, empty, zero total_files, full, partial (only extensions) - _block_to_dict: text, tool_use, unknown type - _flush_partial_dir_entry (#57): idempotent when entry exists, no-file-entries stub path, with-file-entries summary synthesis, notable_files collection Uses the same _make_manager() pattern as test_cache.py to construct a _CacheManager rooted in a tempdir, sidestepping CACHE_ROOT entirely. Doc updates: - CLAUDE.md, README.md, docs/wiki/DevelopmentGuide.md: ai.py is no longer fully exempt — only the API-dependent loops are. Pure helpers are covered by test_ai_pure.py. Verification: 209 tests pass (164 prior + 45 new).	2026-04-11 10:24:47 -06:00
claude-code	f72dc7a0fd	Merge pull request 'refactor(ai): single-source tool registration via register_tool() (#56 )' (#68 ) from refactor/issue-56-tool-registry into main	2026-04-11 16:19:06 +00:00
Jeff Smith	a1b17300e8	refactor(ai): single-source tool registration via register_tool() (#56 ) Adding a tool used to require updating two parallel structures in ai.py: a name->handler entry in _TOOL_DISPATCH and a schema dict in _DIR_TOOLS (or _SYNTHESIS_TOOLS or _SURVEY_TOOLS). Forgetting one half was silent. Internals.md §9.1 documented this as a 5-step process. Replaced both with a single register_tool() call per (tool, scope): register_tool( name="read_file", description="...", schema={...}, scopes=["dir"], handler=_tool_read_file, ) The function appends the schema to one or more scope lists (_DIR_TOOLS / _SYNTHESIS_TOOLS / _SURVEY_TOOLS) and lands the handler in _TOOL_DISPATCH. Tools intercepted by the loop body (submit_report, submit_survey) register schema only with handler=None. Tools whose schema differs by scope (submit_report has different shapes in dir vs synthesis loops) get one register_tool() call per scope. flag is also registered twice because it appears in dir + synthesis at different positions in each list — the order is preserved with two calls rather than reordered for fewer calls. Verification: - _DIR_TOOLS, _SYNTHESIS_TOOLS, _SURVEY_TOOLS contain the same names in the same order as before. - _TOOL_DISPATCH contains the same 10 handlers as before. - 164 tests pass. No behavior change. Phase 3.5 (#39) MCP backend will eventually replace this with dynamic discovery from the connected MCP server, at which point register_tool() collapses to a one-line forward.	2026-04-11 10:18:40 -06:00
claude-code	a5a19fba55	Merge pull request 'refactor(ai): extract _run_dir_loop into three focused helpers (#57 )' (#67 ) from refactor/issue-57-dir-loop-helpers into main	2026-04-11 16:02:45 +00:00
Jeff Smith	427f66b488	refactor(ai): extract _run_dir_loop into three focused helpers (#57 ) _run_dir_loop was ~160 lines holding four conceptual layers in one function: pre-loop setup, budget check + partial-flush, API call + response printing, and tool dispatch + done detection. Phase 3 dynamic turn allocation will inject more state into the same code path, so this debt is paid before that lands. Three new helpers above _run_dir_loop: - _build_dir_loop_context(): pure setup. Builds the dir context, child summaries, survey block, filtered tool list, system prompt, and seed user message. Returns a _DirLoopContext namedtuple. - _flush_partial_dir_entry(): idempotent partial-cache writer for the budget-exceeded path. Returns the partial summary string. Idempotent via cache.has_entry() guard, so callers can call it without checking. - _handle_turn_response(): per-turn response processing. Prints text blocks and tool decisions, appends the assistant message, dispatches tools (or nudges the agent to call submit_report), appends tool_results. Returns (done, summary). _run_dir_loop is now a ~25-line coordinator: build context, then for-loop calls budget check, API, and turn handler in sequence. No behavior change. 164 tests pass. Internals.md §4 updated for the new structure and the file:line refs that drifted.	2026-04-11 10:02:21 -06:00
claude-code	68f327243c	Merge pull request 'chore: update CLAUDE.md for session 9' (#66 ) from chore/session-9-claude-md into main	2026-04-11 15:48:51 +00:00
Jeff Smith	171a48f9e6	chore: update CLAUDE.md for session 9	2026-04-11 09:48:35 -06:00
claude-code	5c5c4dbb1a	Merge pull request 'feat: AI investigation is the product, drop zero-dep constraint (#64 )' (#65 ) from feat/issue-64-ai-first-scope into main	2026-04-11 15:46:46 +00:00
Jeff Smith	c93c748ea3	feat: AI investigation is the product, drop zero-dep constraint (#64 ) Two original design constraints are dropped: 1. Zero-dependency Python CLI is no longer a goal. Luminos installs from requirements.txt like a normal Python project. 2. AI investigation is the headline. The base scan becomes the agent's first input pass, not a standalone product. There is no --ai flag and no --no-ai mode. AI runs unconditionally on every invocation. Watch mode is deleted as part of the same change because a non-AI filesystem-churn monitor conflicts with the new philosophy. If a live update mode is wanted later, it gets rebuilt as incremental AI re-investigation. Code: - Delete luminos_lib/watch.py - Delete luminos_lib/capabilities.py and tests/test_capabilities.py - Move clear_cache() into luminos_lib/cache.py - luminos.py: remove --watch, --ai, --install-extras flags. AI runs unconditionally after the base scan. If ANTHROPIC_API_KEY is unset, exit 0 with a one-line hint before running the base scan. - ai.py: drop the check_ai_dependencies() call and import. - New requirements.txt: anthropic, tree-sitter + grammars, python-magic. - setup_env.sh installs from requirements.txt. Docs: - README.md rewritten to lead with AI investigation, drops the two-modes framing and the watch feature line. - CLAUDE.md (project): rewrites Key Constraints, updates module map and Running Luminos commands. - PLAN.md: strips zero-dep philosophy from the file map and reframes the watch+incremental note as a future live-mode feature. Tests: 164 pass (down from 168 with the 4 removed capabilities tests).	2026-04-11 09:43:47 -06:00
claude-code	54713f09a6	Merge pull request 'Add README and Apache 2.0 LICENSE' (#62 ) from feat/readme-and-apache-license into main	2026-04-09 23:44:52 +00:00
Jeff Smith	700698cba3	Add README and Apache 2.0 LICENSE Preparing luminos for a public GitHub mirror of the Forgejo source of truth. README covers what Luminos is, why, features, installation for base mode and AI mode, usage examples, how the AI investigation works, and a link back to the canonical Forgejo repo. LICENSE is the standard Apache 2.0 text.	2026-04-09 17:36:38 -06:00
Jeff Smith	f589bedf08	chore: update CLAUDE.md for session 8	2026-04-07 14:24:21 -06:00
Jeff Smith	4ef97c5626	merge: fix/issue-54-write-cache-tool-desc	2026-04-07 14:22:12 -06:00
Jeff Smith	c03f4f7c60	fix(ai): document confidence fields in write_cache tool schema (#54 ) The system prompt already instructs the agent to set confidence/ confidence_reason on every write_cache call, but the tool's data schema description listed only the legacy fields. Add the confidence fields and a one-line calibration pointer so the model sees them when binding the tool, not just in the system prompt. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 14:21:57 -06:00
Jeff Smith	4a847d20aa	chore: update CLAUDE.md for session 7	2026-04-07 14:20:53 -06:00
Jeff Smith	fccbca0ce7	chore: update CLAUDE.md for session 6 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 13:48:44 -06:00
Jeff Smith	fc57e33d1f	merge: chore/extract-workflow-to-global	2026-04-07 13:47:41 -06:00
Jeff Smith	b2ead84531	chore: extract workflow sections to global ~/.claude/CLAUDE.md Move Development Workflow, Branching Discipline, Documentation Workflow, ADHD Session Protocols, and Session Protocols out of the project CLAUDE.md and into the global one so all projects share them. Move docs/externalize.md and docs/wrap-up.md to ~/.claude/protocols/ (lightly generalized). Project CLAUDE.md keeps only luminos-specific state, module map, constraints, naming, test command, and session log. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-07 13:47:41 -06:00
Jeff Smith	f63875b448	merge: chore/issue-followups-session5	2026-04-06 23:26:43 -06:00
Jeff Smith	a3b5f6397e	docs(plan): insert session 5 follow-ups #54 , #55 , #56 , #57 into implementation order	2026-04-06 23:26:38 -06:00
Jeff Smith	159ab5207a	chore: update CLAUDE.md for session 5	2026-04-06 23:23:23 -06:00
Jeff Smith	8c0e29b6d8	merge: docs/issue-53-onboarding-internals (#53 )	2026-04-06 23:21:44 -06:00
Jeff Smith	1892784d35	docs: add status snapshot to PLAN.md, fix domain.py file-map (#53 )	2026-04-06 23:21:41 -06:00
Jeff Smith	74477d8c2a	chore(workflow): manually close issues after merge, do not rely on auto-close	2026-04-06 22:58:01 -06:00
Jeff Smith	88ecdb9761	chore: update CLAUDE.md for session 4	2026-04-06 22:52:24 -06:00
Jeff Smith	40af515fb2	merge: feat/issue-44-context-budget (#44 )	2026-04-06 22:49:44 -06:00
Jeff Smith	036c3a934a	fix(ai): correct context budget metric — track per-call, not sum (#44 ) The dir loop was exiting early on small targets (a 13-file Python lib hit the budget at 92k–139k cumulative tokens) because _TokenTracker compared the SUM of input_tokens across all turns to the context window size. input_tokens from each API response is the size of the full prompt sent on that turn (system + every prior message + new tool results), so summing across turns multi-counts everything. The real per-call context size never approached the limit. Verified empirically: on luminos_lib pre-fix, the loop bailed when the most recent call's input_tokens was 20,535 (~10% of Sonnet's 200k window) but the cumulative sum was 134,983. Changes: - _TokenTracker now tracks last_input (the most recent call's input_tokens), separate from the cumulative loop_input/total_input used for cost reporting. - budget_exceeded() returns last_input > CONTEXT_BUDGET, not the cumulative sum. - MAX_CONTEXT bumped from 180_000 to 200_000 (Sonnet 4's real context window). CONTEXT_BUDGET stays at 70% = 140,000. - Early-exit message now shows context size, threshold, AND cumulative spend separately so future debugging is unambiguous. Smoke test on luminos_lib: investigation completes without early exit (~$0.37). 6 unit tests added covering the new semantics, including the key regression: a sequence of small calls whose sum exceeds the budget must NOT trip the check. Wiki Architecture page updated. #51 filed for the separate message-history-growth issue.	2026-04-06 22:49:25 -06:00
Jeff Smith	157ac3f606	merge: feat/issue-42-classifier-bias (#42 )	2026-04-06 22:36:26 -06:00
Jeff Smith	f3abbce7d4	feat(filetypes): expose raw signals to survey, remove classifier bias (#42 ) The survey pass no longer receives the bucketed file_categories histogram, which was biased toward source-code targets and would mislabel mail, notebooks, ledgers, and other non-code domains as "source" via the file --brief "text" pattern fallback. Adds filetypes.survey_signals(), which assembles raw signals from the same `classified` data the bucketer already processes — no new walks, no new dependencies: total_files — total count extension_histogram — top 20 extensions, raw, no taxonomy file_descriptions — top 20 `file --brief` outputs, by count filename_samples — 20 names, evenly drawn (not first-20) `survey --brief` descriptions are truncated at 80 chars before counting so prefixes group correctly without exploding key cardinality. The Band-Aid in _SURVEY_SYSTEM_PROMPT (warning the LLM that the histogram was biased toward source code) is removed and replaced with neutral guidance on how to read the raw signals together. The {file_type_distribution} placeholder is renamed to {survey_signals} to reflect the broader content. luminos.py base scan computes survey_signals once and stores it on report["survey_signals"]; AI consumers read from there. summarize_categories() and report["file_categories"] are unchanged — the terminal report still uses the bucketed view (#49 tracks fixing that follow-up). Smoke tested on two targets: - luminos_lib: identical-quality survey ("Python library package", confidence 0.85), unchanged behavior on code targets. - A synthetic Maildir of 8 messages with `:2,S` flag suffixes: survey now correctly identifies it as "A Maildir-format mailbox containing 8 email messages" with confidence 0.90, names the Maildir naming convention in domain_notes, and correctly marks parse_structure as a skip tool. Before #42 this would have been "8 source files." Adds 8 unit tests for survey_signals covering empty input, extension histogram, description aggregation/truncation, top-N cap, and even-stride filename sampling. #48 tracks the unit-of-analysis limitation (file is the wrong unit for mbox, SQLite, archives, notebooks) — explicitly out of scope for #42 and documented in survey_signals' docstring.	2026-04-06 22:36:14 -06:00
Jeff Smith	55da7fa8dc	docs(plan): add Phase 4.5 (#48 ) and end-of-project #49 #48 captures the unit-of-analysis problem: "file" is the wrong unit for containers (mbox, SQLite, zip, notebooks) and dense directories (Maildir, .git, node_modules). Sequenced after Phase 4 as its own phase since it requires format detection and container handlers. #49 captures the smaller follow-up that the terminal report still shows the biased bucketed view. Deferred to end-of-project tuning.	2026-04-06 22:31:41 -06:00
Jeff Smith	6cda1cc521	docs(plan): defer #46 to end-of-project tuning section	2026-04-06 22:20:54 -06:00
Jeff Smith	896dac686d	merge: feat/issue-7-survey-min-size (#7 )	2026-04-06 22:19:35 -06:00
Jeff Smith	8fb2f90678	feat(ai): skip survey pass for tiny targets (#7 ) Adds a gate in _run_investigation that skips the survey API call when a target has both fewer than _SURVEY_MIN_FILES (5) files AND fewer than _SURVEY_MIN_DIRS (2) directories. AND semantics handle the deep-narrow edge case correctly: a target with 4 files spread across 50 directories still gets a survey because dir count amortizes the cost across 50 dir loops. When skipped, _default_survey() supplies a synthetic dict with confidence=0.0 — chosen specifically so _filter_dir_tools() never enforces skip_tools from a synthetic value. The dir loop receives a generic "small target, read everything" framing in its prompt and keeps its full toolbox. Reorders _discover_directories() to run before the survey gate so total_dirs is available without a second walk. #46 tracks revisiting the threshold values with empirical data after Phase 2 ships and we've run --ai on a variety of real targets. Smoke tested on a 2-file target: gate triggers, default survey substituted, dir loop completes normally. Adds 4 unit tests for _default_survey() covering schema, confidence guard, filter interaction, and empty skip_tools.	2026-04-06 22:19:25 -06:00
Jeff Smith	b2d00dd301	merge: feat/issue-6-wire-survey (#6 )	2026-04-06 22:07:22 -06:00
Jeff Smith	2e3d21f774	feat(ai): wire survey output into dir loop (#6 ) The survey pass now actually steers dir loop behavior, in two ways: 1. Prompt injection: a new {survey_context} placeholder in _DIR_SYSTEM_PROMPT receives the survey description, approach, domain_notes, relevant_tools, and skip_tools so the dir-loop agent has investigation context before its first turn. 2. Tool schema filtering: _filter_dir_tools() removes any tool listed in skip_tools from the schema passed to the API, gated on survey confidence >= 0.5. Control-flow tools (submit_report) are always preserved. This is hard enforcement — the agent literally cannot call a filtered tool, which the smoke test for #5 showed was necessary (prompt-only guidance was ignored). Smoke test on luminos_lib: zero run_command invocations (vs 2 before), context budget no longer exhausted (87k vs 133k), cost ~$0.34 (vs $0.46), investigation completes instead of early-exiting. Adds tests/test_ai_filter.py with 14 tests covering _filter_dir_tools and _format_survey_block — both pure helpers, no live API needed.	2026-04-06 22:07:12 -06:00
Jeff Smith	e942ecc34a	docs(plan): add Phase 2.5 context budget reliability (#44 ) #5 smoke test showed the dir loop exhausts the 126k context budget on a 13-file Python lib. Sequencing #44 between Phase 2 and Phase 3 so the foundation is solid before planning + external tools add more prompt and tool weight.	2026-04-06 21:59:01 -06:00
Jeff Smith	ffd9d9e929	merge: feat/issue-5-run-survey (#5 )	2026-04-06 21:50:08 -06:00
Jeff Smith	fecb24d6e1	feat(ai): add _run_survey() and submit_survey tool (#5 ) Adds the reconnaissance survey pass: a fast, ≤3-turn LLM call that characterizes the target before any directory investigation begins. The survey receives the file-type distribution (from the base scan), a top-2-level tree preview, and the list of available dir-loop tools, and returns description / approach / relevant_tools / skip_tools / domain_notes / confidence via a single submit_survey tool call. Wired into _run_investigation() before the directory loop. Output is logged but not yet consumed — that wiring is #6. Survey failure is non-fatal: if the call errors or runs out of turns, the investigation proceeds without survey context. Also adds a Band-Aid to _SURVEY_SYSTEM_PROMPT warning the LLM that the file-type histogram is biased toward source code (the underlying classifier has no concept of mail, notebooks, ledgers, etc.) and to trust the tree preview when they conflict. The proper fix is #42.	2026-04-06 21:49:59 -06:00
Jeff Smith	05fcaac755	docs(plan): note classifier rebuild (#42 ) in Phase 2 The filetype classifier is biased toward source code and would mislead the survey pass on non-code targets (mail, notebooks, ledgers). #5 ships with a prompt-level Band-Aid; #42 captures the real fix and is sequenced after the survey pass is observable end-to-end and before Phase 3 depends on survey output.	2026-04-06 21:47:49 -06:00
Jeff Smith	2afef76a67	merge: feat/issue-4-survey-prompt (#4 )	2026-04-06 21:35:29 -06:00
Jeff Smith	987f41ec2e	feat(prompts): add _SURVEY_SYSTEM_PROMPT for survey pass (#4 ) Adds the system prompt for the survey reconnaissance pass. The survey agent answers three questions (what is this, what approach, which tools matter) from cheap signals — file type distribution and a top-2-level tree — without reading files. Tool triage is tri-state: relevant, skip, or unlisted (default), so skip is reserved for tools whose use would be actively wrong rather than merely unnecessary. Wiring of _run_survey() and the submit_survey tool follows in #5.	2026-04-06 21:35:17 -06:00
Jeff Smith	0a9afc96c9	chore: update CLAUDE.md for session 3	2026-04-06 21:15:27 -06:00
Jeff Smith	09e5686bea	merge: feat/issue-3-low-confidence-entries (#3 )	2026-04-06 21:13:58 -06:00
Jeff Smith	1d681c8bc1	feat(cache): add low_confidence_entries() query to CacheManager (#3 ) Returns all file and dir cache entries with confidence below a given threshold (default 0.7). Entries missing a confidence field are included as unrated/untrusted. Results sorted ascending by confidence so least-confident entries come first. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-06 21:13:58 -06:00
Jeff Smith	a67e4789b2	merge: feat/issue-2-confidence-prompt (#2 )	2026-04-06 20:46:08 -06:00

1 2

91 commits