Ver Fonte

Update CodeAct README.md (#1534)

* Update README.md

* update documentation in docstring
Xingyao Wang há 1 ano atrás
pai
commit
3886c51217

+ 20 - 14
agenthub/codeact_agent/README.md

@@ -1,23 +1,29 @@
-# CodeAct-based Agent Framework
+# CodeAct Agent Framework
 
-This folder implements the [CodeAct idea](https://arxiv.org/abs/2402.13463) that relies on LLM to autonomously perform actions in a Bash shell. It requires more from the LLM itself: LLM needs to be capable enough to do all the stuff autonomously, instead of stuck in an infinite loop.
+This folder implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
 
-**NOTE: This agent is still highly experimental and under active development to reach the capability described in the original paper & [repo](https://github.com/xingyaoww/code-act).**
+The conceptual idea is illustrated below. At each turn, the agent can:
 
-<video src="https://github.com/xingyaoww/code-act/assets/38853559/62c80ada-62ce-447e-811c-fc801dd4beac"> </video>
-*Demo of the expected capability - work-in-progress.*
+1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
+2. **CodeAct**: Choose to perform the task by executing code
+   - Execute any valid Linux `bash` command
+   - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
 
-```bash
-mkdir workspace
-PYTHONPATH=`pwd`:$PYTHONPATH python3 opendevin/core/main.py -d ./workspace -c CodeActAgent -t "Please write a flask app that returns 'Hello, World\!' at the root URL, then start the app on port 5000. python3 has already been installed for you."
-```
+![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
 
-Example: prompts `gpt-4-0125-preview` to write a flask server, install `flask` library, and start the server.
+## Plugin System
 
-<img width="951" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/325c3115-a343-4cc5-a92b-f1e5d552a077">
+To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin's plugin system:
+- [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
+- [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
 
-<img width="957" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/68ad10c1-744a-4e9d-bb29-0f163d665a0a">
+## Demo
 
-Most of the things are working as expected, except at the end, the model did not follow the instruction to stop the interaction by outputting `<execute> exit </execute>` as instructed.
+https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
 
-**TODO**: This should be fixable by either (1) including a complete in-context example like [this](https://github.com/xingyaoww/mint-bench/blob/main/mint/tasks/in_context_examples/reasoning/with_tool.txt), OR (2) collect some interaction data like this and fine-tune a model (like [this](https://github.com/xingyaoww/code-act), a more complex route).
+*Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
+
+## Work-in-progress & Next step
+
+[] Support web-browsing
+[] Complete the workflow for CodeAct agent to submit Github PRs

+ 36 - 6
agenthub/codeact_agent/codeact_agent.py

@@ -53,6 +53,37 @@ class CodeActAgent(Agent):
     """
     The Code Act Agent is a minimalist agent.
     The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
+
+    ### Overview
+
+    This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
+
+    The conceptual idea is illustrated below. At each turn, the agent can:
+
+    1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
+    2. **CodeAct**: Choose to perform the task by executing code
+    - Execute any valid Linux `bash` command
+    - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
+
+    ![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
+
+    ### Plugin System
+
+    To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin's plugin system:
+    - [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
+    - [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
+
+    ### Demo
+
+    https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
+
+    *Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
+
+    ### Work-in-progress & Next step
+
+    [] Support web-browsing
+    [] Complete the workflow for CodeAct agent to submit Github PRs
+
     """
 
     sandbox_plugins: List[PluginRequirement] = [
@@ -88,18 +119,17 @@ class CodeActAgent(Agent):
 
     def step(self, state: State) -> Action:
         """
-        Performs one step using the Code Act Agent.
+        Performs one step using the CodeAct Agent.
         This includes gathering info on previous steps and prompting the model to make a command to execute.
 
         Parameters:
         - state (State): used to get updated info and background commands
 
         Returns:
-        - CmdRunAction(command) - command action to run
-        - AgentEchoAction(content=INVALID_INPUT_MESSAGE) - invalid command output
-
-        Raises:
-        - NotImplementedError - for actions other than CmdOutputObservation or AgentMessageObservation
+        - CmdRunAction(command) - bash command to run
+        - IPythonRunCellAction(code) - IPython code to run
+        - AgentTalkAction(content) - Talk action to run (e.g. ask for clarification)
+        - AgentFinishAction() - end the interaction
         """
 
         if len(self.messages) == 0:

+ 58 - 26
docs/modules/usage/agents.md

@@ -4,6 +4,64 @@ sidebar_position: 3
 
 # 🧠 Agents and Capabilities
 
+## CodeAct Agent
+
+### Description
+
+This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
+
+The conceptual idea is illustrated below. At each turn, the agent can:
+
+1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
+2. **CodeAct**: Choose to perform the task by executing code
+- Execute any valid Linux `bash` command
+- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
+
+![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
+
+### Plugin System
+
+To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin&#x27;s plugin system:
+- [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
+- [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
+
+### Demo
+
+https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
+
+*Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
+
+
+### Actions
+
+`Action`,
+`CmdRunAction`,
+`IPythonRunCellAction`,
+`AgentEchoAction`,
+`AgentFinishAction`,
+`AgentTalkAction`
+
+### Observations
+
+`CmdOutputObservation`,
+`IPythonRunCellObservation`,
+`AgentMessageObservation`,
+`UserMessageObservation`
+
+### Methods
+
+| Method          | Description                                                                                                                                                                                                                                             |
+| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| `__init__`      | Initializes an agent with `llm` and a list of messages `List[Mapping[str, str]]`                                                                                                                                                                        |
+| `step`          | Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute. |
+| `search_memory` | Not yet implemented                                                                                                                                                                                                                                     |
+
+### Work-in-progress &amp; Next step
+
+[] Support web-browsing
+[] Complete the workflow for CodeAct agent to submit Github PRs
+
+
 ## Monologue Agent
 
 ### Description
@@ -82,29 +140,3 @@ The agent is given its previous action-observation pairs, current task, and hint
 | `__init__`      | Initializes an agent with `llm`                                                                                                                                                           |
 | `step`          | Checks to see if current step is completed, returns `AgentFinishAction` if True. Otherwise, creates a plan prompt and sends to model for inference, adding the result as the next action. |
 | `search_memory` | Not yet implemented                                                                                                                                                                       |
-
-## CodeAct Agent
-
-### Description
-
-The Code Act Agent is a minimalist agent. The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
-
-### Actions
-
-`Action`,
-`CmdRunAction`,
-`AgentEchoAction`,
-`AgentFinishAction`,
-
-### Observations
-
-`CmdOutputObservation`,
-`AgentMessageObservation`,
-
-### Methods
-
-| Method          | Description                                                                                                                                                                                                                                             |
-| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
-| `__init__`      | Initializes an agent with `llm` and a list of messages `List[Mapping[str, str]]`                                                                                                                                                                        |
-| `step`          | First, gets messages from state and then compiles them into a list for context. Next, pass the context list with the prompt to get the next command to execute. Finally, Execute command if valid, else return `AgentEchoAction(INVALID_INPUT_MESSAGE)` |
-| `search_memory` | Not yet implemented                                                                                                                                                                                                                                     |