Graham Neubig afc94a2f0c Microagent to fix issues with npm (#5314) 1 ano atrás
..
micro afc94a2f0c Microagent to fix issues with npm (#5314) 1 ano atrás
prompts 07f0d1ccb3 feat(llm): convert function call request for non-funcall OSS model (#4711) 1 ano atrás
README.md 07b96cc8c9 docs: Add documentation on how to add new tools to codeact_agent (#5150) 1 ano atrás
__init__.py e6847e9e61 Move agenthub within openhands (#4130) 1 ano atrás
codeact_agent.py 4d3b035e00 feat(agent): add BrowseURLAction to CodeAct (produce markdown from URL) (#5285) 1 ano atrás
function_calling.py 4d3b035e00 feat(agent): add BrowseURLAction to CodeAct (produce markdown from URL) (#5285) 1 ano atrás

README.md

CodeAct Agent Framework

This folder implements the CodeAct idea (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).

The conceptual idea is illustrated below. At each turn, the agent can:

  1. Converse: Communicate with humans in natural language to ask for clarification, confirmation, etc.
  2. CodeAct: Choose to perform the task by executing code
    • Execute any valid Linux bash command
    • Execute any valid Python code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.

image

Adding New Tools

The CodeAct agent uses a function calling interface to define tools that the agent can use. Tools are defined in function_calling.py using the ChatCompletionToolParam class from litellm. Each tool consists of:

  1. A description string that explains what the tool does and how to use it
  2. A tool definition using ChatCompletionToolParam that specifies:
    • The tool's name
    • The tool's parameters and their types
    • Required vs optional parameters

Here's an example of how a tool is defined:

MyTool = ChatCompletionToolParam(
    type='function',
    function=ChatCompletionToolParamFunctionChunk(
        name='my_tool',
        description='Description of what the tool does and how to use it',
        parameters={
            'type': 'object',
            'properties': {
                'param1': {
                    'type': 'string',
                    'description': 'Description of parameter 1',
                },
                'param2': {
                    'type': 'integer',
                    'description': 'Description of parameter 2',
                },
            },
            'required': ['param1'],  # List required parameters here
        },
    ),
)

To add a new tool:

  1. Define your tool in function_calling.py following the pattern above
  2. Add your tool to the get_tools() function in function_calling.py
  3. Implement the corresponding action handler in the agent to process the tool's invocation

The agent currently supports several built-in tools:

  • execute_bash: Execute bash commands
  • execute_ipython_cell: Run Python code in IPython
  • browser: Interact with a web browser
  • str_replace_editor: Edit files using string replacement
  • edit_file: Edit files using LLM-based editing

Tools can be enabled/disabled through configuration parameters:

  • codeact_enable_browsing: Enable browser interaction
  • codeact_enable_jupyter: Enable IPython code execution
  • codeact_enable_llm_editor: Enable LLM-based file editing (if disabled, uses string replacement editor instead)