# CodeAct Agent Framework This folder implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.01030), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details). The conceptual idea is illustrated below. At each turn, the agent can: 1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc. 2. **CodeAct**: Choose to perform the task by executing code - Execute any valid Linux `bash` command - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details. ![image](https://github.com/All-Hands-AI/OpenHands/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3) ## Adding New Tools The CodeAct agent uses a function calling interface to define tools that the agent can use. Tools are defined in `function_calling.py` using the `ChatCompletionToolParam` class from `litellm`. Each tool consists of: 1. A description string that explains what the tool does and how to use it 2. A tool definition using `ChatCompletionToolParam` that specifies: - The tool's name - The tool's parameters and their types - Required vs optional parameters Here's an example of how a tool is defined: ```python MyTool = ChatCompletionToolParam( type='function', function=ChatCompletionToolParamFunctionChunk( name='my_tool', description='Description of what the tool does and how to use it', parameters={ 'type': 'object', 'properties': { 'param1': { 'type': 'string', 'description': 'Description of parameter 1', }, 'param2': { 'type': 'integer', 'description': 'Description of parameter 2', }, }, 'required': ['param1'], # List required parameters here }, ), ) ``` To add a new tool: 1. Define your tool in `function_calling.py` following the pattern above 2. Add your tool to the `get_tools()` function in `function_calling.py` 3. Implement the corresponding action handler in the agent to process the tool's invocation The agent currently supports several built-in tools: - `execute_bash`: Execute bash commands - `execute_ipython_cell`: Run Python code in IPython - `browser`: Interact with a web browser - `str_replace_editor`: Edit files using string replacement - `edit_file`: Edit files using LLM-based editing Tools can be enabled/disabled through configuration parameters: - `codeact_enable_browsing`: Enable browser interaction - `codeact_enable_jupyter`: Enable IPython code execution - `codeact_enable_llm_editor`: Enable LLM-based file editing (if disabled, uses string replacement editor instead)