ai/OpenHands @ e5d7735d75c2b3ddbf4baccd4dce0b0b3740154a

Xingyao Wang 31b244f95e [Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)		1 rok pred
..
cleanup.sh	ebafb702e5 Add ML-Bench Evaluation with OpenDevin (#2015)	1 rok pred
run_analysis.sh	563bc41fd3 Use LLM to analyze ML-Bench failure cases (#2399)	1 rok pred
run_infer.sh	31b244f95e [Refactor, Evaluation] Refactor and clean up evaluation harness to remove global config and use EventStreamRuntime (#3230)	1 rok pred
summarise_results.py	beabcce16d [Hotfix] Fix ML-Bench continue ``run_inference.py`` (#2284)	1 rok pred