Engel Nyst
|
eeb2342509
Refactor history/event stream (#3808)
|
1 yıl önce |
Xingyao Wang
|
966da7b7c8
feat(agent, CodeAct 2.2): native CodeAct support for Browsing (#4667)
|
1 yıl önce |
Xingyao Wang
|
ae13171194
feat(agent): CodeAct with function calling (#4537)
|
1 yıl önce |
Xingyao Wang
|
7340b78962
feat(eval): rewrite log_completions to save completions to directory (#4566)
|
1 yıl önce |
mamoodi
|
6f2e678028
Fix eval output path in case of @ char (#4416)
|
1 yıl önce |
Xingyao Wang
|
25f9413965
[Eval] Fix eval stuck when `result` is too large for pbar (#4361)
|
1 yıl önce |
Xingyao Wang
|
9cc9b19958
eval: improve swebench infer error handling and retry (#4205)
|
1 yıl önce |
Xingyao Wang
|
53a015f718
fix: make llm_completions optional to fix `eval_infer.py` (#4148)
|
1 yıl önce |
tobitege
|
c3bbe604eb
(fix) Fix logging in shared eval file to prevent key disclosure (#4108)
|
1 yıl önce |
Xingyao Wang
|
81b3cd71b3
[eval] log evaluating warnings directly to console (#4026)
|
1 yıl önce |
Xingyao Wang
|
1b1d8f0b02
[eval] Use `imap_unorderd` for parallizing evaluation (#4040)
|
1 yıl önce |
Xingyao Wang
|
a66e738957
[eval] use mp Pool instead ProcessPoolExecutor (#4025)
|
1 yıl önce |
Xingyao Wang
|
714e46f29a
[eval] save eventstream & llm completions for SWE-Bench run_infer (#3923)
|
1 yıl önce |
Xingyao Wang
|
5d7f2fd4ae
[eval] Allow evaluation of SWE-Bench patches on `RemoteRuntime` (#3927)
|
1 yıl önce |
Xingyao Wang
|
f996b31d64
[eval] Fix multi-processing bug (again^3) & allow set EXP_NAME for each `run_infer` (#3907)
|
1 yıl önce |
Xingyao Wang
|
2b3925278d
[eval] refactor process instance logic into `update_progress` (#3875)
|
1 yıl önce |
Engel Nyst
|
379f2b6f23
Fix queue length on Macs (#3867)
|
1 yıl önce |
Xingyao Wang
|
3a1b8c093b
[eval] yet another eval fixes on multi-processing (#3854)
|
1 yıl önce |
Xingyao Wang
|
78c5f58adc
refactor & improve retry for the reliability of `RemoteRuntime` & evaluation (#3846)
|
1 yıl önce |
tobitege
|
dbb671a8a5
logname fix; improve test calling instruction (#3666)
|
1 yıl önce |
Xingyao Wang
|
090c911a50
(refactor) Make `Runtime` class synchronous (#3661)
|
1 yıl önce |
tobitege
|
9c39f07430
(enh) Aider-Bench: make resumable with skip_num arg (#3626)
|
1 yıl önce |
Raj Maheshwari
|
e72dc96d13
[Fix] Stop API key from leaking in evaluation outputs. (#3603)
|
1 yıl önce |
tobitege
|
8fcf0817d4
(eval) Aider_bench: add eval_ids arg to run specific instance id's (#3592)
|
1 yıl önce |
Robert Brennan
|
01ae22ef57
Rename OpenDevin to OpenHands (#3472)
|
1 yıl önce |
Xingyao Wang
|
31b244f95e
[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)
|
1 yıl önce |
Graham Neubig
|
3a21198424
Remove monologue agent (#3036)
|
1 yıl önce |
Xingyao Wang
|
cf910dfa9d
fix eval api_key leak in metadata; fix llm config in run infer (#2998)
|
1 yıl önce |
Engel Nyst
|
d37b2973b2
Refactoring: event stream based agent history (#2709)
|
1 yıl önce |
Xingyao Wang
|
f6dc89b41a
[Evaluation] Simplify eval & and multi-processing related fixes (#2810)
|
1 yıl önce |