Stale cache entries survive --fresh when path format changed #79
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Problem
Cache entries from older runs can coexist with entries from new runs for the same directory. The synthesis pass reads all entries and sees duplicates.
Root cause
Cache entries are stored as
dirs/{sha256(path)}.json. The SHA256 is computed from thepathargument passed towrite_entry(). In older versions of luminos, some entries were written with relative paths (e.g.docs/wiki). Current code writes absolute paths (e.g./home/micro/luminos/docs/wiki). These hash to different filenames, so both survive in the same cache directory.--freshcreates a new investigation ID, but the investigation ID for a given target is stored in/tmp/luminos/investigations.jsonkeyed by absolute target path. If an old investigation used a different path format or the mapping was reused, stale entries can leak through.Observed
Running
--freshagainst/home/micro/luminosproduced an investigation with 7 dir cache entries for 5 directories:The synthesis pass called
cache.read_all_entries("dir")and received all 7, producing a report that potentially double-counted some directories.Fix options
Normalize paths before hashing. Always
os.path.realpath()the path before computing the SHA256. This ensures the same directory always produces the same cache key regardless of how it was referenced.--freshshould start from an empty cache directory. If the intent is a clean investigation, delete or ignore the old cache tree entirely rather than appending into it.Deduplicate on read.
read_all_entries()could deduplicate byrelative_pathfield, preferring the newestcached_attimestamp. This is a safety net, not a fix for the root cause.Option 1 is the cleanest. Option 2 is the most robust. Both could be done together.