Commit History

Author SHA1 Message Date
  Xingyao Wang 31b244f95e [Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230) 1 year ago
  Xingyao Wang ff6ddc831f fix: runtime test for mac (#3005) 1 year ago
  Boxuan Li c68478f470 Customize LLM config per agent (#2756) 1 year ago
  Boxuan Li 6f235937cf Evaluation time travel: allow evaluation on a specific version (#2356) 1 year ago
  Boxuan Li f188abd7a3 Delete evaluation outputs files (#2152) 1 year ago
  Ren Ma a9823491e6 Support Logic Reasoning Benchmark (#1973) 1 year ago