Просмотр исходного кода

Remove legacy swe_bench/scripts/summarise_results.py (#2932)

* Remove swe_bench/scripts/summarise_results.py

* Remove mention of legacy script
Boxuan Li 1 год назад
Родитель
Сommit
4b4fa1c390
2 измененных файлов с 0 добавлено и 50 удалено
  1. 0 11
      evaluation/swe_bench/README.md
  2. 0 39
      evaluation/swe_bench/scripts/summarise_results.py

+ 0 - 11
evaluation/swe_bench/README.md

@@ -189,17 +189,6 @@ streamlit run 0_📊_OpenDevin_Benchmark.py --server.port 8501 --server.address
 
 Then you can access the SWE-Bench trajectory visualizer at `localhost:8501`.
 
-
-
-## View Result Summary
-
-If you just want to know the resolve rate, and/or a summary of what tests pass and what don't, you could run
-
-```bash
-poetry run python ./evaluation/swe_bench/scripts/summarise_results.py <path_to_report_json_file>
-# e.g. poetry run python ./evaluation/swe_bench/scripts/summarise_results.py ./evaluation/evaluation_outputs/outputs/swe_bench_lite/CodeActSWEAgent/gpt-4o-2024-05-13_maxiter_50_N_v1.5-no-hint/report.json
-```
-
 ## Submit your evaluation results
 
 You can start your own fork of [our huggingface evaluation outputs](https://huggingface.co/spaces/OpenDevin/evaluation) and submit a PR of your evaluation results following the guide [here](https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-and-discussions).

+ 0 - 39
evaluation/swe_bench/scripts/summarise_results.py

@@ -1,39 +0,0 @@
-import json
-import sys
-
-
-def extract_test_results(json_file_path):
-    passed_instances = set()
-    all_instances = set()
-
-    with open(json_file_path, 'r') as file:
-        report = json.load(file)
-
-        # Add resolved instances
-        for instance_id in report['resolved']:
-            passed_instances.add(instance_id)
-
-        # Add all instances in the report
-        for _, instance_ids in report.items():
-            for instance_id in instance_ids:
-                all_instances.add(instance_id)
-
-    return passed_instances, all_instances
-
-
-if __name__ == '__main__':
-    if len(sys.argv) != 2:
-        print(
-            'Usage: poetry run python summarise_results.py <path_to_report_json_file>'
-        )
-        sys.exit(1)
-    json_file_path = sys.argv[1]
-    passed_instances, all_instances = extract_test_results(json_file_path)
-    succ_rate = len(passed_instances) / len(all_instances)
-    print(
-        f'\nPassed {len(passed_instances)} tests, total {len(all_instances)} tests, resolve rate = {succ_rate:.2%}'
-    )
-    print('PASSED TESTS:')
-    print(sorted(list(passed_instances)))
-    print('FAILED TESTS:')
-    print(sorted(list(all_instances - passed_instances)))