Robert Brennan
|
01ae22ef57
Rename OpenDevin to OpenHands (#3472)
|
1 ano atrás |
Xingyao Wang
|
31b244f95e
[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)
|
1 ano atrás |
Jiayi Pan
|
917d96e06f
Fix doc error in evals (#2654)
|
1 ano atrás |
Boxuan Li
|
6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
|
1 ano atrás |
Ryan H. Tran
|
01296ff79d
Add remaining subsets for MINT benchmark (#2142)
|
1 ano atrás |
Ryan H. Tran
|
9434bcce48
Support MINT benchmark (MATH, GSM8K subset) (#1955)
|
1 ano atrás |