Add model_id to CostMetadata
Tracks which LLM powered each research call. Enables: - Cost analysis across model tiers - Quality calibration (confidence vs model capability) - Reproducibility (know exactly what produced a result) Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
parent
7ad91b7ca9
commit
e861c392e4
1 changed files with 16 additions and 5 deletions
|
|
@ -311,22 +311,30 @@ class CostMetadata:
|
||||||
iterations_run: int # Number of inner-loop iterations
|
iterations_run: int # Number of inner-loop iterations
|
||||||
wall_time_sec: float # Actual elapsed time
|
wall_time_sec: float # Actual elapsed time
|
||||||
budget_exhausted: bool # True if researcher hit iteration or token cap
|
budget_exhausted: bool # True if researcher hit iteration or token cap
|
||||||
|
model_id: str # Model used for the research loop (e.g. "claude-sonnet-4-6")
|
||||||
```
|
```
|
||||||
|
|
||||||
|
The `model_id` field records which LLM powered the researcher's inner loop. This is critical for:
|
||||||
|
- **Cost analysis** — comparing token spend across model tiers
|
||||||
|
- **Quality calibration** — correlating confidence scores with model capability
|
||||||
|
- **Reproducibility** — knowing exactly what produced a given result
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
```python
|
```python
|
||||||
CostMetadata(
|
CostMetadata(
|
||||||
tokens_used=8452,
|
tokens_used=8452,
|
||||||
iterations_run=3,
|
iterations_run=3,
|
||||||
wall_time_sec=42.5,
|
wall_time_sec=42.5,
|
||||||
budget_exhausted=False
|
budget_exhausted=False,
|
||||||
|
model_id="claude-sonnet-4-6"
|
||||||
)
|
)
|
||||||
```
|
```
|
||||||
|
|
||||||
The PI uses this to:
|
The PI uses this to:
|
||||||
- Track costs (token budgets, actual spend)
|
- Track costs (token budgets, actual spend, model tier)
|
||||||
- Detect runaway loops (budget_exhausted = True)
|
- Detect runaway loops (budget_exhausted = True)
|
||||||
- Plan timeouts (wall_time_sec tells you if this is acceptable latency)
|
- Plan timeouts (wall_time_sec tells you if this is acceptable latency)
|
||||||
|
- Compare fidelity-to-cost ratio across models
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|
@ -430,7 +438,8 @@ Response:
|
||||||
"tokens_used": 450,
|
"tokens_used": 450,
|
||||||
"iterations_run": 1,
|
"iterations_run": 1,
|
||||||
"wall_time_sec": 3.2,
|
"wall_time_sec": 3.2,
|
||||||
"budget_exhausted": false
|
"budget_exhausted": false,
|
||||||
|
"model_id": "claude-sonnet-4-6"
|
||||||
},
|
},
|
||||||
"trace_id": "550e8400-e29b-41d4-a716-446655440001"
|
"trace_id": "550e8400-e29b-41d4-a716-446655440001"
|
||||||
}
|
}
|
||||||
|
|
@ -494,7 +503,8 @@ Response:
|
||||||
"tokens_used": 19240,
|
"tokens_used": 19240,
|
||||||
"iterations_run": 4,
|
"iterations_run": 4,
|
||||||
"wall_time_sec": 67.8,
|
"wall_time_sec": 67.8,
|
||||||
"budget_exhausted": false
|
"budget_exhausted": false,
|
||||||
|
"model_id": "claude-sonnet-4-6"
|
||||||
},
|
},
|
||||||
"trace_id": "550e8400-e29b-41d4-a716-446655440002"
|
"trace_id": "550e8400-e29b-41d4-a716-446655440002"
|
||||||
}
|
}
|
||||||
|
|
@ -562,7 +572,8 @@ Response:
|
||||||
"tokens_used": 4998,
|
"tokens_used": 4998,
|
||||||
"iterations_run": 3,
|
"iterations_run": 3,
|
||||||
"wall_time_sec": 31.2,
|
"wall_time_sec": 31.2,
|
||||||
"budget_exhausted": true
|
"budget_exhausted": true,
|
||||||
|
"model_id": "claude-haiku-4-5"
|
||||||
},
|
},
|
||||||
"trace_id": "550e8400-e29b-41d4-a716-446655440003"
|
"trace_id": "550e8400-e29b-41d4-a716-446655440003"
|
||||||
}
|
}
|
||||||
|
|
|
||||||
Loading…
Reference in a new issue