Commit History

Автор SHA1 Съобщение Дата
  super-dainiu ebafb702e5 Add ML-Bench Evaluation with OpenDevin (#2015) преди 1 година
  Leo 2c231c57c9 Add supported benchmarks to evaluation README (AgentBench, BIRD, LogicReasoning) (#2183) преди 1 година
  Ryan H. Tran 9434bcce48 Support MINT benchmark (MATH, GSM8K subset) (#1955) преди 1 година
  Yizhe Zhang 0c829cd067 Support Entity-Deduction-Arena (EDA) Benchmark (#1931) преди 1 година
  Jiayi Pan 2d52298a1d Support GAIA benchmark (#1911) преди 1 година
  Niklas Muennighoff ef6cdb7532 HumanEvalFix integration (#1908) преди 1 година
  Xingyao Wang 2406b901df feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468) преди 1 година
  Jirka Borovec e32d95cb1a lint: simplify hooks already covered by Ruff (#1204) преди 1 година
  hugehope 9cd4ad3298 chore: fix some typos in comments (#1013) преди 1 година
  libowen2121 e256329e5e Update SWE-bench eval results (#978) преди 1 година
  libowen2121 40a3614e80 Add a roadmap for eval (#92) преди 1 година
  Xingyao Wang 5ff96111f0 A starting point for SWE-Bench Evaluation with docker (#60) преди 1 година
  Jiaxin Pei dc88dac296 adding a script to fetch and convert devin's output for evaluation (#81) преди 1 година
  Binyuan Hui f99f4ebdaa fix: typo in the evaluation folder name. (#66) преди 1 година