Pre-Phase-3 test coverage: _TokenTracker, _synthesize_from_cache, _discover_directories #70

New issue

Closed

opened 2026-04-11 10:38:23 -06:00 by claude-code · 0 comments

claude-code commented

2026-04-11 10:38:23 -06:00

Collaborator

Three more pure-ish helpers in ai.py warrant unit coverage before Phase 3 lands. The #55 round picked off the easy decision-logic helpers; these three are the high-impact ones that escaped the first sweep.

Targets

1. `_TokenTracker` class (`ai.py:94`)

Per-loop token accumulator. Has the last_input vs cumulative-totals distinction that #44 fixed and that the entire context budget reliability story rests on. If this regresses, the budget check silently stops triggering (or triggers spuriously) and there is no way to catch it from outside.

Test cases:

record_usage() updates last_input, loop_total, grand_total correctly
reset_loop() zeroes loop_total but preserves grand_total and last_input
budget_exceeded() checks last_input against CONTEXT_BUDGET, not the cumulative sum (the load-bearing #44 fix)
Multiple loops accumulate grand_total across resets
Boundary case: last_input == CONTEXT_BUDGET → not exceeded (the gate is >, not >=)

2. `_synthesize_from_cache()` (synthesis fallback)

Pure function that builds a brief + detailed report from cached file/dir entries when _run_synthesis exhausts its max_turns. Last-resort path that fires almost never in normal runs and is therefore the kind of code that silently rots.

Test cases:

Empty cache → reasonable empty report (not a crash)
Only file entries → brief mentions file count, detailed cites at least one summary
Only dir entries → brief mentions dir count, detailed cites at least one summary
Mixed → both kinds appear in detailed
Long summaries → output is bounded (no infinite truncation bug)

3. `_discover_directories()` (leaves-first walk + skip filter)

Walks the filesystem, applies _should_skip_dir, sorts leaves-first. The output drives the entire dir-loop iteration order. Leaves-first is the foundation of the cache reuse story (a directory's children must be summarized before the directory itself).

Test cases:

Empty target → empty list
Single-level target → one entry
Nested tree (a/b/c, a/d) → leaves come before parents
.git/, __pycache__/, node_modules/, *.egg-info/ are excluded
Custom --exclude list is honored
Symlinks don't cause infinite recursion (smoke test, not exhaustive)

Test layout

Add three new test classes to tests/test_ai_pure.py. Reuses the existing _make_manager() tempdir-cache helper and tempfile.mkdtemp() pattern.

Acceptance criteria

~22 new test methods, all green
Total test count climbs from 209 to ~230
tests/test_ai_pure.py covers the three new helpers
PLAN.md "Implementation Order" reflects this as a pre-Phase-3 milestone
DevelopmentGuide.md test coverage table updated if helper names change

Notes

Discovered after #55 shipped: looking at what was still uncovered, three helpers stood out as "low effort, high impact" by the same criteria #55 used
_TokenTracker is the highest-priority of the three because the #44 fix is load-bearing for every Phase 3 feature that touches token accounting
Not blocking Phase 3 strictly, but the whole point of the pre-Phase-3 cleanup pattern (#54, #55, #56, #57) is to land safety nets BEFORE the new pile of state arrives

Three more pure-ish helpers in `ai.py` warrant unit coverage before Phase 3 lands. The #55 round picked off the easy decision-logic helpers; these three are the high-impact ones that escaped the first sweep. ## Targets ### 1. `_TokenTracker` class (`ai.py:94`) Per-loop token accumulator. Has the `last_input` vs cumulative-totals distinction that #44 fixed and that the entire context budget reliability story rests on. If this regresses, the budget check silently stops triggering (or triggers spuriously) and there is no way to catch it from outside. Test cases: - `record_usage()` updates `last_input`, `loop_total`, `grand_total` correctly - `reset_loop()` zeroes `loop_total` but preserves `grand_total` and `last_input` - `budget_exceeded()` checks `last_input` against `CONTEXT_BUDGET`, not the cumulative sum (the load-bearing #44 fix) - Multiple loops accumulate `grand_total` across resets - Boundary case: `last_input == CONTEXT_BUDGET` → not exceeded (the gate is `>`, not `>=`) ### 2. `_synthesize_from_cache()` (synthesis fallback) Pure function that builds a brief + detailed report from cached file/dir entries when `_run_synthesis` exhausts its max_turns. Last-resort path that fires almost never in normal runs and is therefore the kind of code that silently rots. Test cases: - Empty cache → reasonable empty report (not a crash) - Only file entries → brief mentions file count, detailed cites at least one summary - Only dir entries → brief mentions dir count, detailed cites at least one summary - Mixed → both kinds appear in detailed - Long summaries → output is bounded (no infinite truncation bug) ### 3. `_discover_directories()` (leaves-first walk + skip filter) Walks the filesystem, applies `_should_skip_dir`, sorts leaves-first. The output drives the entire dir-loop iteration order. Leaves-first is the foundation of the cache reuse story (a directory's children must be summarized before the directory itself). Test cases: - Empty target → empty list - Single-level target → one entry - Nested tree (`a/b/c`, `a/d`) → leaves come before parents - `.git/`, `__pycache__/`, `node_modules/`, `*.egg-info/` are excluded - Custom `--exclude` list is honored - Symlinks don't cause infinite recursion (smoke test, not exhaustive) ## Test layout Add three new test classes to `tests/test_ai_pure.py`. Reuses the existing `_make_manager()` tempdir-cache helper and `tempfile.mkdtemp()` pattern. ## Acceptance criteria - ~22 new test methods, all green - Total test count climbs from 209 to ~230 - `tests/test_ai_pure.py` covers the three new helpers - PLAN.md "Implementation Order" reflects this as a pre-Phase-3 milestone - DevelopmentGuide.md test coverage table updated if helper names change ## Notes - Discovered after #55 shipped: looking at what was still uncovered, three helpers stood out as "low effort, high impact" by the same criteria #55 used - `_TokenTracker` is the highest-priority of the three because the #44 fix is load-bearing for every Phase 3 feature that touches token accounting - Not blocking Phase 3 strictly, but the whole point of the pre-Phase-3 cleanup pattern (#54, #55, #56, #57) is to land safety nets BEFORE the new pile of state arrives