Commit History

Автор SHA1 Съобщение Дата
  Ketan Ramaneti 852c90f64a [fix eval] Fix issues with miniwob remote runtime evaluation (#5001) преди 1 година
  Xingyao Wang 50c13aad98 [Eval] Improve SWE-Bench Eval harness: multi-run support & entry script simplification (#4396) преди 1 година
  Xingyao Wang 31b244f95e [Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230) преди 1 година
  Graham Neubig cab7a288ca Add NUM_WORKERS variable to run_infer.sh scripts for configurable woker settings (#2597) преди 1 година
  Boxuan Li feabc97aba Evaluation time travel: build sandbox on the fly (#2491) преди 1 година
  Boxuan Li 6f235937cf Evaluation time travel: allow evaluation on a specific version (#2356) преди 1 година
  Frank Xu 48151bdbb0 [feat] WebArena benchmark, MiniWoB++ benchmark and related arch changes (#2170) преди 1 година