Boxuan Li
|
6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
|
1 年之前 |
Xingyao Wang
|
01ef90205d
Add CodeActSWEAgent to remove browsing & github + improvements on agentskills (#2105)
|
1 年之前 |
Xingyao Wang
|
5114230e53
Some SWE-Bench infer fixes and improvements (#2065)
|
1 年之前 |
Xingyao Wang
|
602ffcdffb
Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941)
|
1 年之前 |
Boxuan Li
|
b845a38169
Small improvements & fixes to SWE-Bench (#1874)
|
1 年之前 |
Xingyao Wang
|
b2fdb963b6
Add detailed tutorial for adding new evaluation benchmarks (#1827)
|
1 年之前 |
Xingyao Wang
|
e31f8b8322
automatically get agent version for eval (#1844)
|
1 年之前 |
Xingyao Wang
|
2406b901df
feat(SWE-Bench environment) integrate SWE-Bench sandbox (#1468)
|
1 年之前 |