In the original SWE-Bench implementation, conda environment for evaluation is typically installed from scratch while evaluating on a particular instance. This poses several challenges:
In OpenDevin-SWE-Bench fork, we try to pre-build the testbed (i.e., code of the repository we want the agent to edit) AND the conda environment, so that in evaluation (inference) time, we can directly leverage existing environments for efficient evaluation.
NOTE: We only support SWE-Bench lite for now. But modifying our existing scripts for full SWE-Bench should be quite straight forward.
Setup your eval workspace by:
Run the following command to do the above two steps. The results will be saved to evaluation/SWE-bench/eval_workspace.
./evaluation/swe_bench/scripts/setup/prepare_swe_utils.sh
./evaluation/swe_bench/scripts/setup/swe_env_setup.sh
pushd evaluation/swe_bench
docker build -t ghcr.io/opendevin/eval-swe-bench:full-v1.2.1 -f ./scripts/docker/Dockerfile.full.v1.1 .
docker push ghcr.io/opendevin/eval-swe-bench:full-v1.2.1