Xingyao Wang
|
50c13aad98
[Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396)
|
1 anno fa |
Engel Nyst
|
e6847e9e61
Move agenthub within openhands (#4130)
|
1 anno fa |
Robert Brennan
|
01ae22ef57
Rename OpenDevin to OpenHands (#3472)
|
1 anno fa |
Xingyao Wang
|
31b244f95e
[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)
|
1 anno fa |
Graham Neubig
|
cab7a288ca
Add NUM_WORKERS variable to run_infer.sh scripts for configurable woker settings (#2597)
|
1 anno fa |
Boxuan Li
|
feabc97aba
Evaluation time travel: build sandbox on the fly (#2491)
|
1 anno fa |
Boxuan Li
|
6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
|
1 anno fa |
Yizhe Zhang
|
8d79c3edbc
modify the exiting logic and reward calculation, delete unused function (#2198)
|
1 anno fa |
Yizhe Zhang
|
0c829cd067
Support Entity-Deduction-Arena (EDA) Benchmark (#1931)
|
1 anno fa |