test(ai): cover _TokenTracker, _synthesize_from_cache, _discover_directories (#70)
Second wave of pre-Phase-3 test coverage. The #55 round picked off the easy decision-logic helpers; this round covers the three highest-impact helpers that escaped the first sweep. Three new test classes appended to tests/test_ai_pure.py: - TestTokenTracker (11 tests) Pins the load-bearing #44 fix: budget_exceeded() must use last_input (the most recent call's context size) NOT cumulative input, because each turn's input_tokens already includes the full message history. Tests assert: cumulative-input far above budget does NOT trip the gate when last_input stays small; reset_loop() preserves grand totals; the boundary is strict > not >=. - TestSynthesizeFromCache (5 tests) The synthesis fallback fires only when _run_synthesis exhausts its max_turns, which almost never happens in normal runs — exactly the kind of code that silently rots. Tests assert: empty cache returns the incomplete-message brief and empty detailed; single dir entry produces a markdown line; multi-entry detailed contains all entries; empty-summary entries are skipped; file entries alone do not satisfy (the function reads dir entries only). - TestDiscoverDirectories (9 tests) The leaves-first walk drives the entire dir-loop iteration order and is the foundation of the cache reuse story. Tests assert: empty target returns target only; nested trees come back leaves- first; .git / __pycache__ / node_modules / *.egg-info excluded; custom --exclude honored; hidden dirs excluded by default; show_ hidden=True includes them but does not override the skip list. PLAN.md: added Phase 2.7 (#56 ✅) and Phase 2.8 (#55 ✅, #70) entries to the implementation order, and removed the now-stale Phase 3.4 (#56) and Background chore (#55) sections that were displaced by the pre-Phase-3 cleanup pattern. Verification: 234 tests pass (209 prior + 25 new).
This commit is contained in:
parent
e9b40e00e0
commit
efaa2024d7
2 changed files with 334 additions and 32 deletions
69
PLAN.md
69
PLAN.md
|
|
@ -576,21 +576,45 @@ without 9 phases of rework.
|
|||
output size, redundant reads) before picking a fix.
|
||||
- Add token-usage instrumentation so regressions are visible.
|
||||
|
||||
### Phase 2.6 — Pre-Phase-3 cleanup (#54, #57)
|
||||
### Phase 2.6 — Pre-Phase-3 cleanup (#54, #57) ✅ shipped
|
||||
Two debts surfaced during the Session 5 documentation deep dive that
|
||||
should be paid before Phase 3 adds more state to the same code paths:
|
||||
were paid before Phase 3 adds more state to the same code paths:
|
||||
|
||||
- **#54 — Phase 1 confidence-write path is dormant.** The cache schema
|
||||
accepts `confidence` and `low_confidence_entries()` works, but no
|
||||
prompt instructs the agent to set the field. Phase 1 is half-shipped.
|
||||
Wire the prompt before Phase 8 (refinement) tries to consume the
|
||||
signal — preferably now while Phase 1 context is still fresh.
|
||||
- **#57 — Refactor `_run_dir_loop` before #10 lands.** The function is
|
||||
at its readability ceiling (~160 lines, four conceptual layers).
|
||||
Phase 3 #10 (dynamic turn allocation) will inject per-dir `max_turns`
|
||||
and possibly mid-loop renegotiation. Extract `_build_dir_loop_context`,
|
||||
`_check_budget_and_flush_partial`, and `_handle_turn_response` into
|
||||
focused helpers first; refactoring after #10 will be harder.
|
||||
- **#54** — Phase 1 confidence-write path was dormant. Cache schema
|
||||
accepted `confidence` and `low_confidence_entries()` worked, but no
|
||||
prompt instructed the agent to set the field. Wired in Session 8.
|
||||
- **#57** — `_run_dir_loop` was ~160 lines holding four conceptual
|
||||
layers. Refactored in Session 9 into three focused helpers
|
||||
(`_build_dir_loop_context`, `_flush_partial_dir_entry`,
|
||||
`_handle_turn_response`) so Phase 3 dynamic turn allocation has a
|
||||
thin coordinator to inject into.
|
||||
|
||||
### Phase 2.7 — Tool registration cleanup (#56) ✅ shipped
|
||||
Adding a tool used to require updating `_TOOL_DISPATCH` and `_DIR_TOOLS`
|
||||
in two separate places. Forgetting one half was silent. Replaced both
|
||||
with a single `register_tool()` call per (tool, scope) in Session 9.
|
||||
Phase 3.5 MCP backend will eventually replace this with dynamic
|
||||
discovery, at which point `register_tool()` collapses to a one-line
|
||||
forward.
|
||||
|
||||
### Phase 2.8 — Pre-Phase-3 test coverage (#55, #70)
|
||||
Safety nets for the helpers Phase 3 will pile state on top of. Two
|
||||
waves:
|
||||
|
||||
- **#55** ✅ shipped — `tests/test_ai_pure.py` covers the easy
|
||||
decision-logic helpers: `_filter_dir_tools`, `_format_survey_block`,
|
||||
`_format_survey_signals`, `_default_survey`, `_should_skip_dir`,
|
||||
`_path_is_safe`, `_block_to_dict`, plus `_flush_partial_dir_entry`
|
||||
from #57. 45 tests added in Session 9.
|
||||
- **#70** — second wave covering the highest-impact remaining helpers
|
||||
that escaped the first sweep:
|
||||
- `_TokenTracker` — pins the load-bearing #44 fix
|
||||
(`last_input` vs cumulative for budget decisions)
|
||||
- `_synthesize_from_cache` — last-resort fallback that fires almost
|
||||
never in normal runs and is therefore the kind of code that silently
|
||||
rots
|
||||
- `_discover_directories` — leaves-first walk and skip-dir filter,
|
||||
foundation of the cache reuse story
|
||||
|
||||
### Phase 3 — Investigation planning
|
||||
- Planning pass after survey, before dir loops
|
||||
|
|
@ -598,17 +622,6 @@ should be paid before Phase 3 adds more state to the same code paths:
|
|||
- Dynamic turn allocation based on plan
|
||||
- Dir loop orchestrator updated to follow plan
|
||||
|
||||
### Phase 3.4 — Tool registration cleanup (#56)
|
||||
Adding a tool currently requires updating `_TOOL_DISPATCH` and
|
||||
`_DIR_TOOLS` in two separate places in `ai.py`. Forgetting one half is
|
||||
silent. A small `@dir_tool` decorator collapses this into one
|
||||
declaration. **Decision point:** Phase 3.5 (MCP) will replace
|
||||
`_TOOL_DISPATCH` with dynamic discovery from an MCP server, which makes
|
||||
this issue partially moot. Either fix now (easier to migrate one
|
||||
well-structured registry to MCP than two parallel structures) or defer
|
||||
and let it die with the Python dispatch table. Decide before starting
|
||||
3.5.
|
||||
|
||||
### Phase 3.5 — MCP backend abstraction (pivot point)
|
||||
See Part 10 for full design. This phase happens *after* Phase 3 is
|
||||
working and *before* Phase 4. The goal is to migrate the filesystem
|
||||
|
|
@ -671,14 +684,6 @@ architecture. The migration pain is intentional and instructive.
|
|||
- Report formatter renders populated fields only
|
||||
- Domain-appropriate section headers
|
||||
|
||||
### Background chore — Unit tests for pure helpers in ai.py (#55)
|
||||
`ai.py` is documented as exempt from unit testing because the dir loop
|
||||
needs a live API. But several pure helpers (`_filter_dir_tools`,
|
||||
`_format_survey_block`, `_format_survey_signals`, `_default_survey`,
|
||||
`_should_skip_dir`, `_path_is_safe`, `_block_to_dict`) have no API
|
||||
dependency and can be tested directly. Low priority, not phase-blocking,
|
||||
fold into any session that touches these helpers.
|
||||
|
||||
### End-of-project tuning
|
||||
- **Honest terminal report file-type view (#49)** — the report still
|
||||
shows the bucketed `summarize_categories()` view, which collapses
|
||||
|
|
|
|||
|
|
@ -13,17 +13,21 @@ import unittest
|
|||
from types import SimpleNamespace
|
||||
|
||||
from luminos_lib.ai import (
|
||||
CONTEXT_BUDGET,
|
||||
_DIR_TOOLS,
|
||||
_PROTECTED_DIR_TOOLS,
|
||||
_SURVEY_CONFIDENCE_THRESHOLD,
|
||||
_TokenTracker,
|
||||
_block_to_dict,
|
||||
_default_survey,
|
||||
_discover_directories,
|
||||
_filter_dir_tools,
|
||||
_flush_partial_dir_entry,
|
||||
_format_survey_block,
|
||||
_format_survey_signals,
|
||||
_path_is_safe,
|
||||
_should_skip_dir,
|
||||
_synthesize_from_cache,
|
||||
)
|
||||
from luminos_lib.cache import _CacheManager
|
||||
|
||||
|
|
@ -420,5 +424,298 @@ class TestFlushPartialDirEntry(unittest.TestCase):
|
|||
self.assertIn("subdir/important.py", entry["notable_files"])
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _TokenTracker (added by #70)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
def _usage(input_tokens=0, output_tokens=0):
|
||||
"""Build a fake Anthropic SDK usage object for the tracker tests."""
|
||||
return SimpleNamespace(
|
||||
input_tokens=input_tokens, output_tokens=output_tokens,
|
||||
)
|
||||
|
||||
|
||||
class TestTokenTracker(unittest.TestCase):
|
||||
def test_initial_state_is_zero(self):
|
||||
t = _TokenTracker()
|
||||
self.assertEqual(t.total_input, 0)
|
||||
self.assertEqual(t.total_output, 0)
|
||||
self.assertEqual(t.loop_input, 0)
|
||||
self.assertEqual(t.loop_output, 0)
|
||||
self.assertEqual(t.last_input, 0)
|
||||
self.assertEqual(t.loop_total, 0)
|
||||
self.assertFalse(t.budget_exceeded())
|
||||
|
||||
def test_record_updates_all_counters(self):
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=1000, output_tokens=200))
|
||||
self.assertEqual(t.total_input, 1000)
|
||||
self.assertEqual(t.total_output, 200)
|
||||
self.assertEqual(t.loop_input, 1000)
|
||||
self.assertEqual(t.loop_output, 200)
|
||||
self.assertEqual(t.last_input, 1000)
|
||||
|
||||
def test_loop_total_property(self):
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=300, output_tokens=70))
|
||||
self.assertEqual(t.loop_total, 370)
|
||||
|
||||
def test_record_with_missing_attrs_defaults_to_zero(self):
|
||||
t = _TokenTracker()
|
||||
t.record(SimpleNamespace())
|
||||
self.assertEqual(t.total_input, 0)
|
||||
self.assertEqual(t.total_output, 0)
|
||||
self.assertEqual(t.last_input, 0)
|
||||
|
||||
def test_multiple_records_accumulate(self):
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=500, output_tokens=100))
|
||||
t.record(_usage(input_tokens=700, output_tokens=200))
|
||||
self.assertEqual(t.total_input, 1200)
|
||||
self.assertEqual(t.total_output, 300)
|
||||
# last_input is the most recent call, NOT the cumulative sum.
|
||||
self.assertEqual(t.last_input, 700)
|
||||
|
||||
def test_reset_loop_zeros_loop_counters_preserves_totals(self):
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=500, output_tokens=100))
|
||||
t.record(_usage(input_tokens=300, output_tokens=50))
|
||||
t.reset_loop()
|
||||
# Loop counters cleared.
|
||||
self.assertEqual(t.loop_input, 0)
|
||||
self.assertEqual(t.loop_output, 0)
|
||||
self.assertEqual(t.last_input, 0)
|
||||
# Cumulative totals preserved.
|
||||
self.assertEqual(t.total_input, 800)
|
||||
self.assertEqual(t.total_output, 150)
|
||||
|
||||
def test_grand_totals_accumulate_across_loops(self):
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=500, output_tokens=100))
|
||||
t.reset_loop()
|
||||
t.record(_usage(input_tokens=400, output_tokens=80))
|
||||
t.reset_loop()
|
||||
t.record(_usage(input_tokens=200, output_tokens=40))
|
||||
self.assertEqual(t.total_input, 1100)
|
||||
self.assertEqual(t.total_output, 220)
|
||||
|
||||
def test_budget_exceeded_uses_last_input_not_cumulative(self):
|
||||
# The load-bearing #44 fix: cumulative is meaningless because
|
||||
# each turn's input_tokens already includes prior history.
|
||||
t = _TokenTracker()
|
||||
# Record many small calls whose CUMULATIVE input would exceed
|
||||
# the budget but whose individual last_input stays small.
|
||||
for _ in range(10):
|
||||
t.record(_usage(input_tokens=CONTEXT_BUDGET // 5, output_tokens=0))
|
||||
self.assertGreater(t.total_input, CONTEXT_BUDGET)
|
||||
# last_input is just the most recent call — well under budget.
|
||||
self.assertEqual(t.last_input, CONTEXT_BUDGET // 5)
|
||||
self.assertFalse(t.budget_exceeded())
|
||||
|
||||
def test_budget_exceeded_strict_greater_than(self):
|
||||
# The gate is `>`, not `>=`. last_input == CONTEXT_BUDGET
|
||||
# should NOT trip the budget.
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=CONTEXT_BUDGET, output_tokens=0))
|
||||
self.assertFalse(t.budget_exceeded())
|
||||
|
||||
def test_budget_exceeded_one_over_trips(self):
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=CONTEXT_BUDGET + 1, output_tokens=0))
|
||||
self.assertTrue(t.budget_exceeded())
|
||||
|
||||
def test_summary_returns_nonempty_string(self):
|
||||
t = _TokenTracker()
|
||||
t.record(_usage(input_tokens=1000, output_tokens=500))
|
||||
out = t.summary()
|
||||
self.assertIsInstance(out, str)
|
||||
self.assertIn("1,000", out)
|
||||
self.assertIn("500", out)
|
||||
self.assertIn("$", out)
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _synthesize_from_cache (added by #70)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestSynthesizeFromCache(unittest.TestCase):
|
||||
def setUp(self):
|
||||
self.target = tempfile.mkdtemp(prefix="luminos-test-target-")
|
||||
self.cache = _make_manager(self.target)
|
||||
|
||||
def tearDown(self):
|
||||
shutil.rmtree(self.target, ignore_errors=True)
|
||||
|
||||
def test_empty_cache_returns_incomplete_message(self):
|
||||
brief, detailed = _synthesize_from_cache(self.cache)
|
||||
self.assertIn("incomplete", brief)
|
||||
self.assertEqual(detailed, "")
|
||||
|
||||
def test_single_dir_entry(self):
|
||||
self.cache.write_entry("dir", "/x/auth", {
|
||||
"path": "/x/auth",
|
||||
"relative_path": "auth",
|
||||
"child_count": 3,
|
||||
"summary": "Authentication module.",
|
||||
"dominant_category": "code",
|
||||
"cached_at": "2026-04-11T00:00:00+00:00",
|
||||
})
|
||||
brief, detailed = _synthesize_from_cache(self.cache)
|
||||
self.assertEqual(brief, "Authentication module.")
|
||||
self.assertIn("**auth/**", detailed)
|
||||
self.assertIn("Authentication module.", detailed)
|
||||
|
||||
def test_multiple_dir_entries_brief_is_first(self):
|
||||
for rel, summary in [
|
||||
("auth", "Auth code."),
|
||||
("db", "Database layer."),
|
||||
("api", "HTTP API."),
|
||||
]:
|
||||
self.cache.write_entry("dir", f"/x/{rel}", {
|
||||
"path": f"/x/{rel}",
|
||||
"relative_path": rel,
|
||||
"child_count": 1,
|
||||
"summary": summary,
|
||||
"dominant_category": "code",
|
||||
"cached_at": "2026-04-11T00:00:00+00:00",
|
||||
})
|
||||
brief, detailed = _synthesize_from_cache(self.cache)
|
||||
# Brief is the first dir entry's summary.
|
||||
self.assertIn(brief, {"Auth code.", "Database layer.", "HTTP API."})
|
||||
# Detailed includes all three with markdown formatting.
|
||||
self.assertIn("Auth code.", detailed)
|
||||
self.assertIn("Database layer.", detailed)
|
||||
self.assertIn("HTTP API.", detailed)
|
||||
self.assertIn("**auth/**", detailed)
|
||||
self.assertIn("**db/**", detailed)
|
||||
self.assertIn("**api/**", detailed)
|
||||
|
||||
def test_dir_entries_with_empty_summary_are_skipped(self):
|
||||
self.cache.write_entry("dir", "/x/empty", {
|
||||
"path": "/x/empty",
|
||||
"relative_path": "empty",
|
||||
"child_count": 0,
|
||||
"summary": "",
|
||||
"dominant_category": "unknown",
|
||||
"cached_at": "2026-04-11T00:00:00+00:00",
|
||||
})
|
||||
self.cache.write_entry("dir", "/x/real", {
|
||||
"path": "/x/real",
|
||||
"relative_path": "real",
|
||||
"child_count": 1,
|
||||
"summary": "Real content.",
|
||||
"dominant_category": "code",
|
||||
"cached_at": "2026-04-11T00:00:00+00:00",
|
||||
})
|
||||
brief, detailed = _synthesize_from_cache(self.cache)
|
||||
self.assertEqual(brief, "Real content.")
|
||||
self.assertNotIn("**empty/**", detailed)
|
||||
self.assertIn("**real/**", detailed)
|
||||
|
||||
def test_file_entries_alone_do_not_satisfy(self):
|
||||
# _synthesize_from_cache reads dir entries only. File entries
|
||||
# alone should produce the "incomplete" fallback.
|
||||
self.cache.write_entry("file", "/x/foo.py", {
|
||||
"path": "/x/foo.py",
|
||||
"relative_path": "foo.py",
|
||||
"size_bytes": 10,
|
||||
"category": "code",
|
||||
"summary": "A file.",
|
||||
"cached_at": "2026-04-11T00:00:00+00:00",
|
||||
})
|
||||
brief, detailed = _synthesize_from_cache(self.cache)
|
||||
self.assertIn("incomplete", brief)
|
||||
self.assertEqual(detailed, "")
|
||||
|
||||
|
||||
# ---------------------------------------------------------------------------
|
||||
# _discover_directories (added by #70)
|
||||
# ---------------------------------------------------------------------------
|
||||
|
||||
class TestDiscoverDirectories(unittest.TestCase):
|
||||
def setUp(self):
|
||||
self.target = tempfile.mkdtemp(prefix="luminos-test-target-")
|
||||
|
||||
def tearDown(self):
|
||||
shutil.rmtree(self.target, ignore_errors=True)
|
||||
|
||||
def _mkdirs(self, *rels):
|
||||
for rel in rels:
|
||||
os.makedirs(os.path.join(self.target, rel), exist_ok=True)
|
||||
|
||||
def test_empty_target_returns_target_only(self):
|
||||
result = _discover_directories(self.target)
|
||||
self.assertEqual(len(result), 1)
|
||||
self.assertEqual(os.path.realpath(result[0]),
|
||||
os.path.realpath(self.target))
|
||||
|
||||
def test_single_subdir(self):
|
||||
self._mkdirs("sub")
|
||||
result = _discover_directories(self.target)
|
||||
self.assertEqual(len(result), 2)
|
||||
# Leaf-first: "sub" (deeper) comes before target.
|
||||
self.assertTrue(result[0].endswith("sub"))
|
||||
|
||||
def test_leaves_first_ordering(self):
|
||||
self._mkdirs("a/b/c", "a/d", "a/b/e")
|
||||
result = _discover_directories(self.target)
|
||||
# Compute depth (sep count) of each result.
|
||||
depths = [d.count(os.sep) for d in result]
|
||||
# Each successive entry should have depth <= the previous one.
|
||||
for i in range(len(depths) - 1):
|
||||
self.assertGreaterEqual(
|
||||
depths[i], depths[i + 1],
|
||||
f"Not leaves-first: {result}",
|
||||
)
|
||||
# Verify all expected dirs are present.
|
||||
rels = [os.path.relpath(d, self.target) for d in result]
|
||||
for expected in ["a/b/c", "a/b/e", "a/d", "a/b", "a", "."]:
|
||||
self.assertIn(expected, rels, f"missing {expected} from {rels}")
|
||||
|
||||
def test_skip_dirs_excluded(self):
|
||||
self._mkdirs(".git", "__pycache__", "node_modules", "src")
|
||||
result = _discover_directories(self.target)
|
||||
rels = [os.path.relpath(d, self.target) for d in result]
|
||||
self.assertIn("src", rels)
|
||||
self.assertNotIn(".git", rels)
|
||||
self.assertNotIn("__pycache__", rels)
|
||||
self.assertNotIn("node_modules", rels)
|
||||
|
||||
def test_egg_info_glob_excluded(self):
|
||||
self._mkdirs("luminos.egg-info", "src")
|
||||
result = _discover_directories(self.target)
|
||||
rels = [os.path.relpath(d, self.target) for d in result]
|
||||
self.assertNotIn("luminos.egg-info", rels)
|
||||
self.assertIn("src", rels)
|
||||
|
||||
def test_custom_exclude_honored(self):
|
||||
self._mkdirs("vendor", "src")
|
||||
result = _discover_directories(self.target, exclude=["vendor"])
|
||||
rels = [os.path.relpath(d, self.target) for d in result]
|
||||
self.assertNotIn("vendor", rels)
|
||||
self.assertIn("src", rels)
|
||||
|
||||
def test_hidden_dirs_excluded_by_default(self):
|
||||
self._mkdirs(".hidden_thing", "visible")
|
||||
result = _discover_directories(self.target)
|
||||
rels = [os.path.relpath(d, self.target) for d in result]
|
||||
self.assertNotIn(".hidden_thing", rels)
|
||||
self.assertIn("visible", rels)
|
||||
|
||||
def test_show_hidden_includes_dotdirs(self):
|
||||
self._mkdirs(".hidden_thing", "visible")
|
||||
result = _discover_directories(self.target, show_hidden=True)
|
||||
rels = [os.path.relpath(d, self.target) for d in result]
|
||||
self.assertIn(".hidden_thing", rels)
|
||||
self.assertIn("visible", rels)
|
||||
|
||||
def test_show_hidden_does_not_override_skip_list(self):
|
||||
# .git is in _SKIP_DIRS so even with show_hidden=True it stays out.
|
||||
self._mkdirs(".git")
|
||||
result = _discover_directories(self.target, show_hidden=True)
|
||||
rels = [os.path.relpath(d, self.target) for d in result]
|
||||
self.assertNotIn(".git", rels)
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
unittest.main()
|
||||
|
|
|
|||
Loading…
Reference in a new issue