Ketan Ramaneti
|
42b49e6c43
[fix eval] Fix issues with aider_bench remote runtime evaluation (#5000)
|
1 tahun lalu |
Engel Nyst
|
eeb2342509
Refactor history/event stream (#3808)
|
1 tahun lalu |
Xingyao Wang
|
1f23dc89b6
fix(eval): add runtime.connect to all eval harness (#4565)
|
1 tahun lalu |
Xingyao Wang
|
2d5b360505
refactor: re-organize different runtime implementations into an impl folder (#4346)
|
1 tahun lalu |
Xingyao Wang
|
da548d308c
[agent] LLM-based editing (#3985)
|
1 tahun lalu |
Xingyao Wang
|
b23c7aab5a
[eval] stop set sid in eval (#4311)
|
1 tahun lalu |
Aditya Bharat Soni
|
0809d26f4d
fix: Allow evaluation benchmarks to pass image urls in run_controller() instead of simply passing strings (#4100)
|
1 tahun lalu |
Xingyao Wang
|
0c2a35b256
[eval] update aider bench scripts (#4203)
|
1 tahun lalu |
tofarr
|
152f99c64f
Chore Bump python version (#3545)
|
1 tahun lalu |
tobitege
|
dbb671a8a5
logname fix; improve test calling instruction (#3666)
|
1 tahun lalu |
Xingyao Wang
|
090c911a50
(refactor) Make `Runtime` class synchronous (#3661)
|
1 tahun lalu |
tobitege
|
9c39f07430
(enh) Aider-Bench: make resumable with skip_num arg (#3626)
|
1 tahun lalu |
Raj Maheshwari
|
0cdeb83b17
Enabling of unittests in aider benchmark should be optional. (#3620)
|
1 tahun lalu |
Raj Maheshwari
|
789f15a5db
Allow the Agent to run uniittests for verification. (#3609)
|
1 tahun lalu |
tobitege
|
8fcf0817d4
(eval) Aider_bench: add eval_ids arg to run specific instance id's (#3592)
|
1 tahun lalu |
Graham Neubig
|
f9088766e8
Allow setting of runtime container image (#3573)
|
1 tahun lalu |
Raj Maheshwari
|
11d8d05b1a
[Fix] Metrics should be updated when agent reaches max iterations. (#3549)
|
1 tahun lalu |
Raj Maheshwari
|
80f88e14cd
[Feat] Aider Benchmark (#3507)
|
1 tahun lalu |