|
|
@@ -1,12 +1,12 @@
|
|
|
# Pre-build Testbed and Env
|
|
|
|
|
|
-In the original SWE-Bench implementation, conda environment for evaluation is typically installed from scratch while evaluating on a paticular instance. This poses serveral challenges:
|
|
|
+In the original SWE-Bench implementation, conda environment for evaluation is typically installed from scratch while evaluating on a particular instance. This poses several challenges:
|
|
|
|
|
|
-- Effeciency: most time of evaluation will be wasted on downloading packages
|
|
|
+- Efficiency: most time of evaluation will be wasted on downloading packages
|
|
|
- Stability: setup could failed due to bad internet connectivity
|
|
|
- Reliability: it is possible that an instance is considered failed not because the agent did badly, but because the environment setup failed.
|
|
|
|
|
|
-In OpenDevin-SWE-Bench fork, we try to pre-build the **testbed** (i.e., code of the repository we want the agent to edit) AND the **conda environment**, so that in evaluation (inference) time, we can directly leverage existing environments for effecienct evaluation.
|
|
|
+In OpenDevin-SWE-Bench fork, we try to pre-build the **testbed** (i.e., code of the repository we want the agent to edit) AND the **conda environment**, so that in evaluation (inference) time, we can directly leverage existing environments for efficient evaluation.
|
|
|
|
|
|
NOTE: We only support SWE-Bench lite for now. But modifying our existing scripts for full SWE-Bench should be quite straight forward.
|
|
|
|