Commit History

Autor SHA1 Mensaxe Data
  Boxuan Li 6f235937cf Evaluation time travel: allow evaluation on a specific version (#2356) hai 1 ano
  Xingyao Wang 11a2d1682d Minor SWE-Bench inference config tweak (#2381) hai 1 ano
  Xingyao Wang a6ba6c5277 Add SWEBench-docker eval (#2085) hai 1 ano
  tobitege 5776474dcf Fix SWE-Bench README typos (#2250) hai 1 ano
  Boxuan Li 4d14b44a9a SWE-bench: Add summarise utility script to view passed/failed task IDs (#2137) hai 1 ano
  Xingyao Wang 2c0a2dbc61 fix yet another swe_bench issue (#2069) hai 1 ano
  Xingyao Wang 5114230e53 Some SWE-Bench infer fixes and improvements (#2065) hai 1 ano
  Xingyao Wang 6ff50ed369 Fix SWE-Bench evaluation due to setuptools version (#1995) hai 1 ano
  Boxuan Li 4add8a5595 SWE-bench: Allow selection of tasks (#1935) hai 1 ano
  Boxuan Li b845a38169 Small improvements & fixes to SWE-Bench (#1874) hai 1 ano
  Xingyao Wang b2fdb963b6 Add detailed tutorial for adding new evaluation benchmarks (#1827) hai 1 ano
  Boxuan Li a57a213c7c Turn off auto linting by default, and on for swe_bench (#1861) hai 1 ano
  Xingyao Wang 0fdbe1ee93 Update README.md (#1825) hai 1 ano
  Xingyao Wang 2406b901df feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468) hai 1 ano