Xingyao Wang
|
50c13aad98
[Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396)
|
1 year ago |
Xingyao Wang
|
31b244f95e
[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)
|
1 year ago |
Graham Neubig
|
cab7a288ca
Add NUM_WORKERS variable to run_infer.sh scripts for configurable woker settings (#2597)
|
1 year ago |
Boxuan Li
|
feabc97aba
Evaluation time travel: build sandbox on the fly (#2491)
|
1 year ago |
Boxuan Li
|
6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
|
1 year ago |
Niklas Muennighoff
|
ef6cdb7532
HumanEvalFix integration (#1908)
|
1 year ago |