libowen2121 пре 2 година
родитељ
комит
40a3614e80
1 измењених фајлова са 8 додато и 0 уклоњено
  1. 8 0
      evaluation/README.md

+ 8 - 0
evaluation/README.md

@@ -9,6 +9,14 @@ all the preprocessing/evaluation/analysis scripts.
   - Raw data and experimental records should not be stored within this repo (e.g. Google Drive or Hugging Face Datasets).
   - Important data files of manageable size and analysis scripts (e.g., jupyter notebooks) can be directly uploaded to this repo.
 
+## Roadmap
+
+- Sanity check. Reproduce Devin's scores on SWE-bench using the released outputs to make sure that our harness pipeline works.
+- Open source model support.
+  - Contributors are encouraged to submit their commits to our [forked SEW-bench repo](https://github.com/OpenDevin/SWE-bench).
+  - Ensure compatibility with OpenAI interface for inference.
+  - Serve open source models, prioritizing high concurrency and throughput.
+
 ## Tasks
 ### SWE-bench
 - notebooks