Engel Nyst
|
eeb2342509
Refactor history/event stream (#3808)
|
1 år sedan |
Xingyao Wang
|
966da7b7c8
feat(agent, CodeAct 2.2): native CodeAct support for Browsing (#4667)
|
1 år sedan |
Xingyao Wang
|
9c2b48ff5d
fix(eval): SWE-Bench instance with upper-case instance id (#4649)
|
1 år sedan |
Xingyao Wang
|
ae13171194
feat(agent): CodeAct with function calling (#4537)
|
1 år sedan |
Xingyao Wang
|
1f23dc89b6
fix(eval): add runtime.connect to all eval harness (#4565)
|
1 år sedan |
Xingyao Wang
|
7340b78962
feat(eval): rewrite log_completions to save completions to directory (#4566)
|
1 år sedan |
Xingyao Wang
|
2d5b360505
refactor: re-organize different runtime implementations into an impl folder (#4346)
|
1 år sedan |
Xingyao Wang
|
da548d308c
[agent] LLM-based editing (#3985)
|
1 år sedan |
Alejandro Cuadron Lafuente
|
a9a593bb21
[Fix] Added support to specify the platform on which the runtime image should be built. (#4402)
|
1 år sedan |
Xingyao Wang
|
91308ba4dc
feat: clean-up retries RemoteRuntime & add FatalErrorObservation (#4485)
|
1 år sedan |
Jiayi Pan
|
c1b323a076
Show actual dataset name in swebench log directory (#4417)
|
1 år sedan |
Xingyao Wang
|
b23c7aab5a
[eval] stop set sid in eval (#4311)
|
1 år sedan |
Robert Brennan
|
45fb4fb9bc
allow reconnecting to a runtime (#4223)
|
1 år sedan |
Engel Nyst
|
e6847e9e61
Move agenthub within openhands (#4130)
|
1 år sedan |
Aditya Bharat Soni
|
0809d26f4d
fix: Allow evaluation benchmarks to pass image urls in run_controller() instead of simply passing strings (#4100)
|
1 år sedan |
Xingyao Wang
|
9cc9b19958
eval: improve swebench infer error handling and retry (#4205)
|
1 år sedan |
Xingyao Wang
|
1109637efb
Update instruction for new version of eval runtime-api (#4128)
|
1 år sedan |
Xingyao Wang
|
714e46f29a
[eval] save eventstream & llm completions for SWE-Bench run_infer (#3923)
|
1 år sedan |
tofarr
|
ad0b549d8b
Feat Tightening up Timeouts and interrupt conditions. (#3926)
|
1 år sedan |
Xingyao Wang
|
f996b31d64
[eval] Fix multi-processing bug (again^3) & allow set EXP_NAME for each `run_infer` (#3907)
|
1 år sedan |
Xingyao Wang
|
3a1b8c093b
[eval] yet another eval fixes on multi-processing (#3854)
|
1 år sedan |
Xingyao Wang
|
78c5f58adc
refactor & improve retry for the reliability of `RemoteRuntime` & evaluation (#3846)
|
1 år sedan |
Xingyao Wang
|
47d9621742
[eval] SWE-Bench eval usability fixes (#3836)
|
1 år sedan |
Xingyao Wang
|
2fe2f4c530
[eval] increase timeout for SWEBench eval init/complete (#3829)
|
1 år sedan |
Jiayi Pan
|
43c4a7fff4
Allow Generalized SWE-Bench format for evaluation (#3752)
|
1 år sedan |
Xingyao Wang
|
688068a44e
Fix issues for running `RemoteRuntime` in parallel on SWE-Bench (#3716)
|
1 år sedan |
Xingyao Wang
|
d283420ac2
feat: add SWE-bench fullset support (#3477)
|
1 år sedan |
Xingyao Wang
|
090c911a50
(refactor) Make `Runtime` class synchronous (#3661)
|
1 år sedan |
Xingyao Wang
|
8b1f207d39
feat: support remote runtime (#3406)
|
1 år sedan |
Xingyao Wang
|
98081b9b1b
(eval) EOF fixes for SWE-Bench evaluation (#3623)
|
1 år sedan |