M3.2 Multi-axis stress test #45
Labels
No labels
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference: archeious/marchwarden#45
Loading…
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Phase 3 — Stress Testing & Calibration, milestone 2.
Goal
A single complex query that exercises multiple contract features simultaneously, validating that they compose under load.
Query
What this should exercise
discovery_eventsDeliverable
M3.2 Results
One deep query, one trace, three of four target axes hit. Full writeup archived at
docs/stress-tests/M3.2-results.md(PR #57).Trace:
74a017bd-697b-4439-96b8-fe12057cf2e8recency=currentcontradiction_detected=Truecontradiction-type discovery_eventscope_exceededgapsource_not_foundbudget_exhausted=TrueSnapshot
source_not_found)related_researchx2,new_sourcex1,contradictionx1 — first in-the-wild observation)Notes
Multi-axis composition validated. A single deep query exercised three contract features simultaneously without losing structure. Confidence dropped appropriately and the right factors fired.
First
contradictiondiscovery_event seen. Documented type atresearchers/web/models.py:154, just hadn't fired in M3.1. All three documented discovery types are now reachable in practice.Scope_exceeded miss is soft, not filing. Re-checked the 5
source_not_foundgaps: only one (HFT-specific cold start benchmarks) is genuinely scope_exceeded — HFT firms don't publish those, so it's the wrong-researcher case. The other 4 (outage details, SLA percentages, post-mortems) are reasonable assource_not_found. So 1 of 5, not severe enough to file. May resurface in M3.3 calibration; will file then if so.#54 paid off immediately. This is the first stress test after the persisted-result fix shipped. Recovering all 5 gap categories, all 4 discovery types, and the full confidence_factors took one Python one-liner against
<trace_id>.result.jsoninstead of grepping rendered terminal output.Closing this issue.