Boxuan Li
|
6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
|
1 rok pred |
Xingyao Wang
|
11a2d1682d
Minor SWE-Bench inference config tweak (#2381)
|
1 rok pred |
Xingyao Wang
|
a6ba6c5277
Add SWEBench-docker eval (#2085)
|
1 rok pred |
tobitege
|
5776474dcf
Fix SWE-Bench README typos (#2250)
|
1 rok pred |
Boxuan Li
|
4d14b44a9a
SWE-bench: Add summarise utility script to view passed/failed task IDs (#2137)
|
1 rok pred |
Xingyao Wang
|
2c0a2dbc61
fix yet another swe_bench issue (#2069)
|
1 rok pred |
Xingyao Wang
|
5114230e53
Some SWE-Bench infer fixes and improvements (#2065)
|
1 rok pred |
Xingyao Wang
|
6ff50ed369
Fix SWE-Bench evaluation due to setuptools version (#1995)
|
1 rok pred |
Boxuan Li
|
4add8a5595
SWE-bench: Allow selection of tasks (#1935)
|
1 rok pred |
Boxuan Li
|
b845a38169
Small improvements & fixes to SWE-Bench (#1874)
|
1 rok pred |
Xingyao Wang
|
b2fdb963b6
Add detailed tutorial for adding new evaluation benchmarks (#1827)
|
1 rok pred |
Boxuan Li
|
a57a213c7c
Turn off auto linting by default, and on for swe_bench (#1861)
|
1 rok pred |
Xingyao Wang
|
0fdbe1ee93
Update README.md (#1825)
|
1 rok pred |
Xingyao Wang
|
2406b901df
feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468)
|
1 rok pred |