1 year ago · 83b94786a3
--- a/openhands/agenthub/codeact_agent/README.md
+++ b/openhands/agenthub/codeact_agent/README.md
@@ -1,28 +1,75 @@
 
															 # CodeAct Agent Framework
														
 
															-This folder implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
														
 
															+This folder is an implementation of OpenHands's main agent, the CodeAct Agent. It is based on ([CodeAct](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)), an idea of consolidating LLM agents' **act**ions into a unified **code** action space for both *simplicity* and *performance*.
														
 
															-The conceptual idea is illustrated below. At each turn, the agent can:
														
 
															+## Overview
														
 
															+
														
 
															+The CodeAct agent operates through a function calling interface. At each turn, the agent can:
														
 
															 1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
														
 
															-2. **CodeAct**: Choose to perform the task by executing code
														
 
															-   - Execute any valid Linux `bash` command
														
 
															-   - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
														
 
															+2. **CodeAct**: Execute actions through a set of well-defined tools:
														
 
															+   - Execute Linux `bash` commands with `execute_bash`
														
 
															+   - Run Python code in an [IPython](https://ipython.org/) environment with `execute_ipython_cell`
														
 
															+   - Interact with web browsers using `browser` and `web_read`
														
 
															+   - Edit files using `str_replace_editor` or `edit_file`
														
 
															 ![image](https://github.com/All-Hands-AI/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
														
 
															-## Adding New Tools
														
 
															+## Built-in Tools
														
 
															+
														
 
															+The agent provides several built-in tools:
														
 
															+
														
 
															+### 1. `execute_bash`
														
 
															+- Execute any valid Linux bash command
														
 
															+- Handles long-running commands by running them in background with output redirection
														
 
															+- Supports interactive processes with STDIN input and process interruption
														
 
															+- Handles command timeouts with automatic retry in background mode
														
 
															+
														
 
															+### 2. `execute_ipython_cell`
														
 
															+- Run Python code in an IPython environment
														
 
															+- Supports magic commands like `%pip`
														
 
															+- Variables are scoped to the IPython environment
														
 
															+- Requires defining variables and importing packages before use
														
 
															+
														
 
															+### 3. `web_read` and `browser`
														
 
															+- `web_read`: Read and convert webpage content to markdown
														
 
															+- `browser`: Interact with webpages through Python code
														
 
															+- Supports common browser actions like navigation, clicking, form filling, scrolling
														
 
															+- Handles file uploads and drag-and-drop operations
														
 
															+
														
 
															+### 4. `str_replace_editor`
														
 
															+- View, create and edit files through string replacement
														
 
															+- Persistent state across command calls
														
 
															+- File viewing with line numbers
														
 
															+- String replacement with exact matching
														
 
															+- Undo functionality for edits
														
 
															+
														
 
															+### 5. `edit_file` (LLM-based)
														
 
															+- Edit files using LLM-based content generation
														
 
															+- Support for partial file edits with line ranges
														
 
															+- Handles large files by editing specific sections
														
 
															+- Append mode for adding content to files
														
 
															+
														
 
															+## Configuration
														
 
															-The CodeAct agent uses a function calling interface to define tools that the agent can use. Tools are defined in `function_calling.py` using the `ChatCompletionToolParam` class from `litellm`. Each tool consists of:
														
 
															+Tools can be enabled/disabled through configuration parameters:
														
 
															+- `codeact_enable_browsing`: Enable browser interaction tools
														
 
															+- `codeact_enable_jupyter`: Enable IPython code execution
														
 
															+- `codeact_enable_llm_editor`: Enable LLM-based file editing (falls back to string replacement editor if disabled)
														
 
															+
														
 
															+## Micro-agents
														
 
															-1. A description string that explains what the tool does and how to use it
														
 
															-2. A tool definition using `ChatCompletionToolParam` that specifies:
														
 
															-   - The tool's name
														
 
															-   - The tool's parameters and their types
														
 
															-   - Required vs optional parameters
														
 
															+The agent includes specialized micro-agents for specific tasks:
														
 
															-Here's an example of how a tool is defined:
														
 
															+1. **npm**: Handles npm package installation with non-interactive shell workarounds
														
 
															+2. **github**: Manages GitHub operations with API token support and PR creation guidelines
														
 
															+3. **flarglebargle**: Easter egg response handler
														
 
															+
														
 
															+## Adding New Tools
														
 
															+The CodeAct agent uses a function calling interface based on `litellm`'s `ChatCompletionToolParam`. To add a new tool:
														
 
															+
														
 
															+1. Define the tool in `function_calling.py`:
														
 
															 ```python
														
 
															 MyTool = ChatCompletionToolParam(
														
 
															     type='function',
														
@@ -47,20 +94,20 @@ MyTool = ChatCompletionToolParam(
 
															 )
														
 
															 ```
														
 
															-To add a new tool:
														
 
															+2. Add the tool to `get_tools()` in `function_calling.py`
														
 
															+3. Implement the corresponding action handler in the agent class
														
 
															-1. Define your tool in `function_calling.py` following the pattern above
														
 
															-2. Add your tool to the `get_tools()` function in `function_calling.py`
														
 
															-3. Implement the corresponding action handler in the agent to process the tool's invocation
														
 
															+## Implementation Details
														
 
															-The agent currently supports several built-in tools:
														
 
															-- `execute_bash`: Execute bash commands
														
 
															-- `execute_ipython_cell`: Run Python code in IPython
														
 
															-- `browser`: Interact with a web browser
														
 
															-- `str_replace_editor`: Edit files using string replacement
														
 
															-- `edit_file`: Edit files using LLM-based editing
														
 
															+The agent is implemented in two main files:
														
 
															-Tools can be enabled/disabled through configuration parameters:
														
 
															-- `codeact_enable_browsing`: Enable browser interaction
														
 
															-- `codeact_enable_jupyter`: Enable IPython code execution
														
 
															-- `codeact_enable_llm_editor`: Enable LLM-based file editing (if disabled, uses string replacement editor instead)
														
 
															+1. `codeact_agent.py`: Core agent implementation with:
														
 
															+   - Message history management
														
 
															+   - Tool execution handling
														
 
															+   - State management
														
 
															+   - Action/observation processing
														
 
															+
														
 
															+2. `function_calling.py`: Tool definitions and function calling interface with:
														
 
															+   - Tool parameter specifications
														
 
															+   - Tool descriptions and examples
														
 
															+   - Function calling response parsing