1 year ago · b569ba710d
--- a/evaluation/swe_bench/README.md
+++ b/evaluation/swe_bench/README.md
@@ -154,6 +154,33 @@ The final results will be saved to `evaluation/evaluation_outputs/outputs/swe_be
 
				 - `report.json`: a JSON file that contains keys like `"resolved"` pointing to instance IDs that are resolved by the agent.
			
 
				 - `summary.json`: a JSON file contains more fine-grained information for each test instance.
			
 
				 
			
 
				+## Visualize Results
			
 
				+
			
 
				+First you need to clone `https://huggingface.co/spaces/OpenDevin/evaluation` and add your own running results from opendevin into the `outputs` of the cloned repo.
			
 
				+
			
 
				+```bash
			
 
				+git clone https://huggingface.co/spaces/OpenDevin/evaluation
			
 
				+```
			
 
				+
			
 
				+**(optional) setup streamlit environment with conda**:
			
 
				+```bash
			
 
				+conda create -n streamlit python=3.10
			
 
				+conda activate streamlit
			
 
				+pip install streamlit altair st_pages
			
 
				+```
			
 
				+
			
 
				+**run the visualizer**:
			
 
				+Then, in a separate Python environment with `streamlit` library, you can run the following:
			
 
				+
			
 
				+```bash
			
 
				+# Make sure you are inside the cloned `evaluation` repo
			
 
				+conda activate streamlit # if you follow the optional conda env setup above
			
 
				+streamlit run 0_📊_OpenDevin_Benchmark.py --server.port 8501 --server.address 0.0.0.0
			
 
				+```
			
 
				+
			
 
				+Then you can access the SWE-Bench trajectory visualizer at `localhost:8501`.
			
 
				+
			
 
				+
			
 
				 
			
 
				 ## View Result Summary