1 年之前 · 5d920489fc
--- a/docs/modules/usage/custom_sandbox_guide.md
+++ b/docs/modules/usage/custom_sandbox_guide.md
@@ -90,34 +90,7 @@ Congratulations!
 
				 
			
 
				 ## Technical Explanation
			
 
				 
			
 
				-The relevant code is defined in [ssh_box.py](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/ssh_box.py) and [image_agnostic_util.py](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py).
			
 
				-
			
 
				-In particular, `ssh_box.py` checks the config object for ```config.sandbox_container_image``` and then attempts to retrieve the image using [get_od_sandbox_image](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py#L72) which is defined in image_agnostic_util.py.
			
 
				-
			
 
				-When first using a custom image, it will not be found and thus it will be built (on subsequent runs the built image will be found and returned).
			
 
				-
			
 
				-The custom image is built using [_build_sandbox_image()](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py#L29), which creates a docker file using your custom_image as a base and then configures the environment for OpenDevin, like this:
			
 
				-
			
 
				-```python
			
 
				-dockerfile_content = (
			
 
				-    f'FROM {base_image}\n'
			
 
				-    'RUN apt update && apt install -y openssh-server wget sudo\n'
			
 
				-    'RUN mkdir -p -m0755 /var/run/sshd\n'
			
 
				-    'RUN mkdir -p /opendevin && mkdir -p /opendevin/logs && chmod 777 /opendevin/logs\n'
			
 
				-    'RUN echo "" > /opendevin/bash.bashrc\n'
			
 
				-    'RUN if [ ! -d /opendevin/miniforge3 ]; then \\\n'
			
 
				-    '        wget "https://github.com/conda-forge/miniforge/releases/latest/download/Miniforge3-$(uname)-$(uname -m).sh" && \\\n'
			
 
				-    '        bash Miniforge3-$(uname)-$(uname -m).sh -b -p /opendevin/miniforge3 && \\\n'
			
 
				-    '        chmod -R g+w /opendevin/miniforge3 && \\\n'
			
 
				-    '        bash -c ". /opendevin/miniforge3/etc/profile.d/conda.sh && conda config --set changeps1 False && conda config --append channels conda-forge"; \\\n'
			
 
				-    '    fi\n'
			
 
				-    'RUN /opendevin/miniforge3/bin/pip install --upgrade pip\n'
			
 
				-    'RUN /opendevin/miniforge3/bin/pip install jupyterlab notebook jupyter_kernel_gateway flake8\n'
			
 
				-    'RUN /opendevin/miniforge3/bin/pip install python-docx PyPDF2 python-pptx pylatexenc openai\n'
			
 
				-).strip()
			
 
				-```
			
 
				-
			
 
				-> Note: the name of the image is modified via [_get_new_image_name()](https://github.com/OpenDevin/OpenDevin/blob/main/opendevin/runtime/docker/image_agnostic_util.py#L63) and it is the modified name that is searched for on subsequent runs.
			
 
				+Please refer to [custom docker image section of the runtime documentation](https://docs.all-hands.dev/modules/usage/runtime#advanced-how-opendevin-builds-and-maintains-od-runtime-images) for more details.
			
 
				 
			
 
				 ## Troubleshooting / Errors
			
 
				 
			
--- a/docs/modules/usage/evaluation_harness.md
+++ b/docs/modules/usage/evaluation_harness.md
@@ -84,9 +84,35 @@ To integrate your own benchmark, we suggest starting with the one that most clos
 
				 
			
 
				 ## How to create an evaluation workflow
			
 
				 
			
 
				+
			
 
				 To create an evaluation workflow for your benchmark, follow these steps:
			
 
				 
			
 
				-1. Create a configuration:
			
 
				+1. Import relevant OpenDevin utilities:
			
 
				+   ```python
			
 
				+    import agenthub
			
 
				+    from evaluation.utils.shared import (
			
 
				+        EvalMetadata,
			
 
				+        EvalOutput,
			
 
				+        make_metadata,
			
 
				+        prepare_dataset,
			
 
				+        reset_logger_for_multiprocessing,
			
 
				+        run_evaluation,
			
 
				+    )
			
 
				+    from opendevin.controller.state.state import State
			
 
				+    from opendevin.core.config import (
			
 
				+        AppConfig,
			
 
				+        SandboxConfig,
			
 
				+        get_llm_config_arg,
			
 
				+        parse_arguments,
			
 
				+    )
			
 
				+    from opendevin.core.logger import opendevin_logger as logger
			
 
				+    from opendevin.core.main import create_runtime, run_controller
			
 
				+    from opendevin.events.action import CmdRunAction
			
 
				+    from opendevin.events.observation import CmdOutputObservation, ErrorObservation
			
 
				+    from opendevin.runtime.runtime import Runtime
			
 
				+   ```
			
 
				+
			
 
				+2. Create a configuration:
			
 
				    ```python
			
 
				    def get_config(instance: pd.Series, metadata: EvalMetadata) -> AppConfig:
			
 
				        config = AppConfig(
			
@@ -103,7 +129,7 @@ To create an evaluation workflow for your benchmark, follow these steps:
 
				        return config
			
 
				    ```
			
 
				 
			
 
				-2. Initialize the runtime and set up the evaluation environment:
			
 
				+3. Initialize the runtime and set up the evaluation environment:
			
 
				    ```python
			
 
				    async def initialize_runtime(runtime: Runtime, instance: pd.Series):
			
 
				        # Set up your evaluation environment here
			
@@ -111,7 +137,7 @@ To create an evaluation workflow for your benchmark, follow these steps:
 
				        pass
			
 
				    ```
			
 
				 
			
 
				-3. Create a function to process each instance:
			
 
				+4. Create a function to process each instance:
			
 
				    ```python
			
 
				    async def process_instance(instance: pd.Series, metadata: EvalMetadata) -> EvalOutput:
			
 
				        config = get_config(instance, metadata)
			
@@ -141,7 +167,7 @@ To create an evaluation workflow for your benchmark, follow these steps:
 
				        )
			
 
				    ```
			
 
				 
			
 
				-4. Run the evaluation:
			
 
				+5. Run the evaluation:
			
 
				    ```python
			
 
				    metadata = make_metadata(llm_config, dataset_name, agent_class, max_iterations, eval_note, eval_output_dir)
			
 
				    output_file = os.path.join(metadata.eval_output_dir, 'output.jsonl')
			
@@ -162,8 +188,6 @@ Remember to customize the `get_instruction`, `your_user_response_function`, and
 
				 
			
 
				 By following this structure, you can create a robust evaluation workflow for your benchmark within the OpenDevin framework.
			
 
				 
			
 
				-Certainly! I'll add a section explaining the user_response_fn and include a description of the workflow and interaction. Here's an updated version of the guideline with the new section:
			
 
				-
			
 
				 
			
 
				 ## Understanding the `user_response_fn`
			
 
				 
			
--- a/opendevin/README.md
+++ b/opendevin/README.md
@@ -48,8 +48,5 @@ flowchart LR
 
				 ```
			
 
				 
			
 
				 ## Runtime
			
 
				-The Runtime class is abstract, and has a few different implementations:
			
 
				 
			
 
				-* We have a LocalRuntime, which runs commands and edits files directly on the user's machine
			
 
				-* We have a DockerRuntime, which runs commands inside of a docker sandbox, and edits files directly on the user's machine
			
 
				-* We have an E2BRuntime, which uses [e2b.dev containers](https://github.com/e2b-dev/e2b) to sandbox file and command operations
			
 
				+Please refer to the [documentation](https://docs.all-hands.dev/modules/usage/runtime) to learn more about `Runtime`.