Xingyao Wang
|
31b244f95e
[Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)
|
1 year ago |
Xingyao Wang
|
ff6ddc831f
fix: runtime test for mac (#3005)
|
1 year ago |
Boxuan Li
|
c68478f470
Customize LLM config per agent (#2756)
|
1 year ago |
Boxuan Li
|
6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
|
1 year ago |
Boxuan Li
|
f188abd7a3
Delete evaluation outputs files (#2152)
|
1 year ago |
Ren Ma
|
a9823491e6
Support Logic Reasoning Benchmark (#1973)
|
1 year ago |