Engel Nyst
|
d37b2973b2
Refactoring: event stream based agent history (#2709)
|
1 rok temu |
Graham Neubig
|
d0384cafdd
Two fixes to swe bench eval (#2831)
|
1 rok temu |
Xingyao Wang
|
f6dc89b41a
[Evaluation] Simplify eval & and multi-processing related fixes (#2810)
|
1 rok temu |
Graham Neubig
|
a081935fd8
Simplify eval code (#2775)
|
1 rok temu |
Graham Neubig
|
ffd3c7144c
Remove global args (#2760)
|
1 rok temu |
Engel Nyst
|
2d9bb56763
Add ability to restore the cli session (optional) (#2699)
|
1 rok temu |
Engel Nyst
|
874b4c9075
CLI concurrency (#2695)
|
1 rok temu |
RainRat
|
745ae42a72
fix typos (#2352)
|
1 rok temu |
Leo
|
040d6bd806
fix: add an early exit check for agent answers in agent bench. (#2257)
|
1 rok temu |
Ryan H. Tran
|
22e8fb39b1
add cost metrics to evaluation outputs for all benchmarks (#2199)
|
1 rok temu |
Leo
|
be251b11de
Add AgentBench. (#2012)
|
1 rok temu |