|
|
hai 1 ano | |
|---|---|---|
| .. | ||
| SWE-bench | hai 1 ano | |
| regression | hai 1 ano | |
| README.md | %!s(int64=2) %!d(string=hai) anos | |
This folder contains code and resources to run experiments and evaluations.
To better organize the evaluation folder, we should follow the rules below:
evaluation/SWE-bench should contain
all the preprocessing/evaluation/analysis scripts.devin_eval_analysis.ipynb: notebook analyzing devin's outputsprepare_devin_outputs_for_evaluation.py: script fetching and converting devin's output into the desired json file for evaluation.python prepare_devin_outputs_for_evaluation.py <setting> where setting can be passed, failed or allwget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_passed.jsonwget https://huggingface.co/datasets/OpenDevin/Devin-SWE-bench-output/raw/main/devin_swe_outputs.jsonSee SWE-bench/README.md for more details on how to run SWE-Bench for evaluation.