Graham Neubig
|
cab7a288ca
Add NUM_WORKERS variable to run_infer.sh scripts for configurable woker settings (#2597)
|
vor 1 Jahr |
Boxuan Li
|
feabc97aba
Evaluation time travel: build sandbox on the fly (#2491)
|
vor 1 Jahr |
Boxuan Li
|
6f235937cf
Evaluation time travel: allow evaluation on a specific version (#2356)
|
vor 1 Jahr |
Ryan H. Tran
|
0584e428b2
[Mint evaluation] Fix bug in stopping when the agent reaches max steps or solution proposals (#2268)
|
vor 1 Jahr |
Ryan H. Tran
|
9434bcce48
Support MINT benchmark (MATH, GSM8K subset) (#1955)
|
vor 1 Jahr |