1 年之前 · 3886c51217
--- a/agenthub/codeact_agent/README.md
+++ b/agenthub/codeact_agent/README.md
@@ -1,23 +1,29 @@
 
															-# CodeAct-based Agent Framework
														
 
															+# CodeAct Agent Framework
														
 
															-This folder implements the [CodeAct idea](https://arxiv.org/abs/2402.13463) that relies on LLM to autonomously perform actions in a Bash shell. It requires more from the LLM itself: LLM needs to be capable enough to do all the stuff autonomously, instead of stuck in an infinite loop.
														
 
															+This folder implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
														
 
															-**NOTE: This agent is still highly experimental and under active development to reach the capability described in the original paper & [repo](https://github.com/xingyaoww/code-act).**
														
 
															+The conceptual idea is illustrated below. At each turn, the agent can:
														
 
															-<video src="https://github.com/xingyaoww/code-act/assets/38853559/62c80ada-62ce-447e-811c-fc801dd4beac"> </video>
														
 
															-*Demo of the expected capability - work-in-progress.*
														
 
															+1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
														
 
															+2. **CodeAct**: Choose to perform the task by executing code
														
 
															+   - Execute any valid Linux `bash` command
														
 
															+   - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
														
 
															-```bash
														
 
															-mkdir workspace
														
 
															-PYTHONPATH=`pwd`:$PYTHONPATH python3 opendevin/core/main.py -d ./workspace -c CodeActAgent -t "Please write a flask app that returns 'Hello, World\!' at the root URL, then start the app on port 5000. python3 has already been installed for you."
														
 
															-```
														
 
															+![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
														
 
															-Example: prompts `gpt-4-0125-preview` to write a flask server, install `flask` library, and start the server.
														
 
															+## Plugin System
														
 
															-<img width="951" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/325c3115-a343-4cc5-a92b-f1e5d552a077">
														
 
															+To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin's plugin system:
														
 
															+- [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
														
 
															+- [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
														
 
															-<img width="957" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/68ad10c1-744a-4e9d-bb29-0f163d665a0a">
														
 
															+## Demo
														
 
															-Most of the things are working as expected, except at the end, the model did not follow the instruction to stop the interaction by outputting `<execute> exit </execute>` as instructed.
														
 
															+https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
														
 
															-**TODO**: This should be fixable by either (1) including a complete in-context example like [this](https://github.com/xingyaoww/mint-bench/blob/main/mint/tasks/in_context_examples/reasoning/with_tool.txt), OR (2) collect some interaction data like this and fine-tune a model (like [this](https://github.com/xingyaoww/code-act), a more complex route).
														
 
															+*Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
														
 
															+
														
 
															+## Work-in-progress & Next step
														
 
															+
														
 
															+[] Support web-browsing
														
 
															+[] Complete the workflow for CodeAct agent to submit Github PRs
														
--- a/agenthub/codeact_agent/codeact_agent.py
+++ b/agenthub/codeact_agent/codeact_agent.py
@@ -53,6 +53,37 @@ class CodeActAgent(Agent):
 
															     """
														
 
															     The Code Act Agent is a minimalist agent.
														
 
															     The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
														
 
															+
														
 
															+    ### Overview
														
 
															+
														
 
															+    This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
														
 
															+
														
 
															+    The conceptual idea is illustrated below. At each turn, the agent can:
														
 
															+
														
 
															+    1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
														
 
															+    2. **CodeAct**: Choose to perform the task by executing code
														
 
															+    - Execute any valid Linux `bash` command
														
 
															+    - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
														
 
															+
														
 
															+    ![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
														
 
															+
														
 
															+    ### Plugin System
														
 
															+
														
 
															+    To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin's plugin system:
														
 
															+    - [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
														
 
															+    - [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
														
 
															+
														
 
															+    ### Demo
														
 
															+
														
 
															+    https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
														
 
															+
														
 
															+    *Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
														
 
															+
														
 
															+    ### Work-in-progress & Next step
														
 
															+
														
 
															+    [] Support web-browsing
														
 
															+    [] Complete the workflow for CodeAct agent to submit Github PRs
														
 
															+
														
 
															     """
														
 
															     sandbox_plugins: List[PluginRequirement] = [
														
@@ -88,18 +119,17 @@ class CodeActAgent(Agent):
 
															     def step(self, state: State) -> Action:
														
 
															         """
														
 
															-        Performs one step using the Code Act Agent.
														
 
															+        Performs one step using the CodeAct Agent.
														
 
															         This includes gathering info on previous steps and prompting the model to make a command to execute.
														
 
															         Parameters:
														
 
															         - state (State): used to get updated info and background commands
														
 
															         Returns:
														
 
															-        - CmdRunAction(command) - command action to run
														
 
															-        - AgentEchoAction(content=INVALID_INPUT_MESSAGE) - invalid command output
														
 
															-
														
 
															-        Raises:
														
 
															-        - NotImplementedError - for actions other than CmdOutputObservation or AgentMessageObservation
														
 
															+        - CmdRunAction(command) - bash command to run
														
 
															+        - IPythonRunCellAction(code) - IPython code to run
														
 
															+        - AgentTalkAction(content) - Talk action to run (e.g. ask for clarification)
														
 
															+        - AgentFinishAction() - end the interaction
														
 
															         """
														
 
															         if len(self.messages) == 0:
														
--- a/docs/modules/usage/agents.md
+++ b/docs/modules/usage/agents.md
@@ -4,6 +4,64 @@ sidebar_position: 3
 
															 # 🧠 Agents and Capabilities
														
 
															+## CodeAct Agent
														
 
															+
														
 
															+### Description
														
 
															+
														
 
															+This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
														
 
															+
														
 
															+The conceptual idea is illustrated below. At each turn, the agent can:
														
 
															+
														
 
															+1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
														
 
															+2. **CodeAct**: Choose to perform the task by executing code
														
 
															+- Execute any valid Linux `bash` command
														
 
															+- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
														
 
															+
														
 
															+![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
														
 
															+
														
 
															+### Plugin System
														
 
															+
														
 
															+To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin&#x27;s plugin system:
														
 
															+- [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
														
 
															+- [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
														
 
															+
														
 
															+### Demo
														
 
															+
														
 
															+https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
														
 
															+
														
 
															+*Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
														
 
															+
														
 
															+
														
 
															+### Actions
														
 
															+
														
 
															+`Action`,
														
 
															+`CmdRunAction`,
														
 
															+`IPythonRunCellAction`,
														
 
															+`AgentEchoAction`,
														
 
															+`AgentFinishAction`,
														
 
															+`AgentTalkAction`
														
 
															+
														
 
															+### Observations
														
 
															+
														
 
															+`CmdOutputObservation`,
														
 
															+`IPythonRunCellObservation`,
														
 
															+`AgentMessageObservation`,
														
 
															+`UserMessageObservation`
														
 
															+
														
 
															+### Methods
														
 
															+
														
 
															+| Method          | Description                                                                                                                                                                                                                                             |
														
 
															+| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
														
 
															+| `__init__`      | Initializes an agent with `llm` and a list of messages `List[Mapping[str, str]]`                                                                                                                                                                        |
														
 
															+| `step`          | Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute. |
														
 
															+| `search_memory` | Not yet implemented                                                                                                                                                                                                                                     |
														
 
															+
														
 
															+### Work-in-progress &amp; Next step
														
 
															+
														
 
															+[] Support web-browsing
														
 
															+[] Complete the workflow for CodeAct agent to submit Github PRs
														
 
															+
														
 
															+
														
 
															 ## Monologue Agent
														
 
															 ### Description
														
@@ -82,29 +140,3 @@ The agent is given its previous action-observation pairs, current task, and hint
 
															 | `__init__`      | Initializes an agent with `llm`                                                                                                                                                           |
														
 
															 | `step`          | Checks to see if current step is completed, returns `AgentFinishAction` if True. Otherwise, creates a plan prompt and sends to model for inference, adding the result as the next action. |
														
 
															 | `search_memory` | Not yet implemented                                                                                                                                                                       |
														
 
															-
														
 
															-## CodeAct Agent
														
 
															-
														
 
															-### Description
														
 
															-
														
 
															-The Code Act Agent is a minimalist agent. The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
														
 
															-
														
 
															-### Actions
														
 
															-
														
 
															-`Action`,
														
 
															-`CmdRunAction`,
														
 
															-`AgentEchoAction`,
														
 
															-`AgentFinishAction`,
														
 
															-
														
 
															-### Observations
														
 
															-
														
 
															-`CmdOutputObservation`,
														
 
															-`AgentMessageObservation`,
														
 
															-
														
 
															-### Methods
														
 
															-
														
 
															-| Method          | Description                                                                                                                                                                                                                                             |
														
 
															-| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
														
 
															-| `__init__`      | Initializes an agent with `llm` and a list of messages `List[Mapping[str, str]]`                                                                                                                                                                        |
														
 
															-| `step`          | First, gets messages from state and then compiles them into a list for context. Next, pass the context list with the prompt to get the next command to execute. Finally, Execute command if valid, else return `AgentEchoAction(INVALID_INPUT_MESSAGE)` |
														
 
															-| `search_memory` | Not yet implemented                                                                                                                                                                                                                                     |