Explorar o código

[eval] SWE-Bench eval usability fixes (#3836)

* [eval] increase timeout for swebench eval init/complete

* allow CmdRunAction to optionally block when .timeout is setted

* fix unit test for serialization

* fix unit tests for security analyzer

* fix integration tests

* add more timeout

* only check P2P when instances are non-empty;
convert P2P and F2P columns to string instead of list

---------

Co-authored-by: Graham Neubig <neubig@gmail.com>
Xingyao Wang hai 1 ano
pai
achega
47d9621742
Modificáronse 1 ficheiros con 2 adicións e 2 borrados
  1. 2 2
      evaluation/swe_bench/run_infer.py

+ 2 - 2
evaluation/swe_bench/run_infer.py

@@ -466,11 +466,11 @@ if __name__ == '__main__':
     output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
     instances = prepare_dataset(swe_bench_tests, output_file, args.eval_n_limit)
 
-    if not isinstance(
+    if len(instances) > 0 and not isinstance(
         instances['PASS_TO_PASS'][instances['PASS_TO_PASS'].index[0]], str
     ):
         for col in ['PASS_TO_PASS', 'FAIL_TO_PASS']:
-            instances[col] = instances[col].apply(lambda x: str(list(x)))
+            instances[col] = instances[col].apply(lambda x: str(x))
 
     run_evaluation(
         instances, metadata, output_file, args.eval_num_workers, process_instance