|
|
@@ -116,9 +116,11 @@ selected_ids = ['sphinx-doc__sphinx-8721', 'sympy__sympy-14774', 'scikit-learn__
|
|
|
Then only these tasks (rows whose `instance_id` is in the above list) will be evaluated.
|
|
|
In this case, `eval_limit` option applies to tasks that are in the `selected_ids` list.
|
|
|
|
|
|
+After running the inference, you will obtain a `output.jsonl` (by default it will be saved to `evaluation/evaluation_outputs`).
|
|
|
+
|
|
|
## Evaluate Generated Patches
|
|
|
|
|
|
-After running the inference described in the previous section, you will obtain a `output.jsonl` (by default it will save to `evaluation/evaluation_outputs`). Then you can run this one line script to evaluate generated patches, and produce a fine-grained report:
|
|
|
+With `output.jsonl` file, you can run `eval_infer.sh` to evaluate generated patches, and produce a fine-grained report.
|
|
|
|
|
|
If you want to evaluate existing results, you should first run this to clone existing outputs
|
|
|
|
|
|
@@ -185,6 +187,15 @@ It will contains an additional field `fine_grained_report` (see example below) c
|
|
|
|
|
|
Please refer to [EVAL_PATCH.md](./EVAL_PATCH.md) if you want to learn more about how to evaluate patches that are already generated (e.g., not by OpenDevin).
|
|
|
|
|
|
+## View Result Summary
|
|
|
+
|
|
|
+If you just want to know the resolve rate, and/or a summary of what tests pass and what don't, you could run
|
|
|
+
|
|
|
+```bash
|
|
|
+poetry run python ./evaluation/swe_bench/scripts/summarise_results.py <path_to_output_merged_jsonl_file>
|
|
|
+# e.g. poetry run python ./evaluation/swe_bench/scripts/summarise_results.py ./evaluation/evaluation_outputs/outputs/swe_bench_lite/CodeActSWEAgent/gpt-4o-2024-05-13_maxiter_50_N_v1.5-no-hint/output.merged.jsonl
|
|
|
+```
|
|
|
+
|
|
|
## Submit your evaluation results
|
|
|
|
|
|
You can start your own fork of [our huggingface evaluation outputs](https://huggingface.co/spaces/OpenDevin/evaluation) and submit a PR of your evaluation results following the guide [here](https://huggingface.co/docs/hub/en/repositories-pull-requests-discussions#pull-requests-and-discussions).
|