há 1 ano atrás · 3886c51217
--- a/agenthub/codeact_agent/README.md
+++ b/agenthub/codeact_agent/README.md
@@ -1,23 +1,29 @@
 
				-# CodeAct-based Agent Framework
			
 
				+# CodeAct Agent Framework
			
 
				 
			
 
				-This folder implements the [CodeAct idea](https://arxiv.org/abs/2402.13463) that relies on LLM to autonomously perform actions in a Bash shell. It requires more from the LLM itself: LLM needs to be capable enough to do all the stuff autonomously, instead of stuck in an infinite loop.
			
 
				+This folder implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
			
 
				 
			
 
				-**NOTE: This agent is still highly experimental and under active development to reach the capability described in the original paper & [repo](https://github.com/xingyaoww/code-act).**
			
 
				+The conceptual idea is illustrated below. At each turn, the agent can:
			
 
				 
			
 
				-<video src="https://github.com/xingyaoww/code-act/assets/38853559/62c80ada-62ce-447e-811c-fc801dd4beac"> </video>
			
 
				-*Demo of the expected capability - work-in-progress.*
			
 
				+1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
			
 
				+2. **CodeAct**: Choose to perform the task by executing code
			
 
				+   - Execute any valid Linux `bash` command
			
 
				+   - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
			
 
				 
			
 
				-```bash
			
 
				-mkdir workspace
			
 
				-PYTHONPATH=`pwd`:$PYTHONPATH python3 opendevin/core/main.py -d ./workspace -c CodeActAgent -t "Please write a flask app that returns 'Hello, World\!' at the root URL, then start the app on port 5000. python3 has already been installed for you."
			
 
				-```
			
 
				+![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
			
 
				 
			
 
				-Example: prompts `gpt-4-0125-preview` to write a flask server, install `flask` library, and start the server.
			
 
				+## Plugin System
			
 
				 
			
 
				-<img width="951" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/325c3115-a343-4cc5-a92b-f1e5d552a077">
			
 
				+To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin's plugin system:
			
 
				+- [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
			
 
				+- [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
			
 
				 
			
 
				-<img width="957" alt="image" src="https://github.com/OpenDevin/OpenDevin/assets/38853559/68ad10c1-744a-4e9d-bb29-0f163d665a0a">
			
 
				+## Demo
			
 
				 
			
 
				-Most of the things are working as expected, except at the end, the model did not follow the instruction to stop the interaction by outputting `<execute> exit </execute>` as instructed.
			
 
				+https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
			
 
				 
			
 
				-**TODO**: This should be fixable by either (1) including a complete in-context example like [this](https://github.com/xingyaoww/mint-bench/blob/main/mint/tasks/in_context_examples/reasoning/with_tool.txt), OR (2) collect some interaction data like this and fine-tune a model (like [this](https://github.com/xingyaoww/code-act), a more complex route).
			
 
				+*Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
			
 
				+
			
 
				+## Work-in-progress & Next step
			
 
				+
			
 
				+[] Support web-browsing
			
 
				+[] Complete the workflow for CodeAct agent to submit Github PRs
			
--- a/agenthub/codeact_agent/codeact_agent.py
+++ b/agenthub/codeact_agent/codeact_agent.py
@@ -53,6 +53,37 @@ class CodeActAgent(Agent):
 
				     """
			
 
				     The Code Act Agent is a minimalist agent.
			
 
				     The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
			
 
				+
			
 
				+    ### Overview
			
 
				+
			
 
				+    This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
			
 
				+
			
 
				+    The conceptual idea is illustrated below. At each turn, the agent can:
			
 
				+
			
 
				+    1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
			
 
				+    2. **CodeAct**: Choose to perform the task by executing code
			
 
				+    - Execute any valid Linux `bash` command
			
 
				+    - Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
			
 
				+
			
 
				+    ![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
			
 
				+
			
 
				+    ### Plugin System
			
 
				+
			
 
				+    To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin's plugin system:
			
 
				+    - [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
			
 
				+    - [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
			
 
				+
			
 
				+    ### Demo
			
 
				+
			
 
				+    https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
			
 
				+
			
 
				+    *Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
			
 
				+
			
 
				+    ### Work-in-progress & Next step
			
 
				+
			
 
				+    [] Support web-browsing
			
 
				+    [] Complete the workflow for CodeAct agent to submit Github PRs
			
 
				+
			
 
				     """
			
 
				 
			
 
				     sandbox_plugins: List[PluginRequirement] = [
			
@@ -88,18 +119,17 @@ class CodeActAgent(Agent):
 
				 
			
 
				     def step(self, state: State) -> Action:
			
 
				         """
			
 
				-        Performs one step using the Code Act Agent.
			
 
				+        Performs one step using the CodeAct Agent.
			
 
				         This includes gathering info on previous steps and prompting the model to make a command to execute.
			
 
				 
			
 
				         Parameters:
			
 
				         - state (State): used to get updated info and background commands
			
 
				 
			
 
				         Returns:
			
 
				-        - CmdRunAction(command) - command action to run
			
 
				-        - AgentEchoAction(content=INVALID_INPUT_MESSAGE) - invalid command output
			
 
				-
			
 
				-        Raises:
			
 
				-        - NotImplementedError - for actions other than CmdOutputObservation or AgentMessageObservation
			
 
				+        - CmdRunAction(command) - bash command to run
			
 
				+        - IPythonRunCellAction(code) - IPython code to run
			
 
				+        - AgentTalkAction(content) - Talk action to run (e.g. ask for clarification)
			
 
				+        - AgentFinishAction() - end the interaction
			
 
				         """
			
 
				 
			
 
				         if len(self.messages) == 0:
			
--- a/docs/modules/usage/agents.md
+++ b/docs/modules/usage/agents.md
@@ -4,6 +4,64 @@ sidebar_position: 3
 
				 
			
 
				 # 🧠 Agents and Capabilities
			
 
				 
			
 
				+## CodeAct Agent
			
 
				+
			
 
				+### Description
			
 
				+
			
 
				+This agent implements the CodeAct idea ([paper](https://arxiv.org/abs/2402.13463), [tweet](https://twitter.com/xingyaow_/status/1754556835703751087)) that consolidates LLM agents’ **act**ions into a unified **code** action space for both *simplicity* and *performance* (see paper for more details).
			
 
				+
			
 
				+The conceptual idea is illustrated below. At each turn, the agent can:
			
 
				+
			
 
				+1. **Converse**: Communicate with humans in natural language to ask for clarification, confirmation, etc.
			
 
				+2. **CodeAct**: Choose to perform the task by executing code
			
 
				+- Execute any valid Linux `bash` command
			
 
				+- Execute any valid `Python` code with [an interactive Python interpreter](https://ipython.org/). This is simulated through `bash` command, see plugin system below for more details.
			
 
				+
			
 
				+![image](https://github.com/OpenDevin/OpenDevin/assets/38853559/92b622e3-72ad-4a61-8f41-8c040b6d5fb3)
			
 
				+
			
 
				+### Plugin System
			
 
				+
			
 
				+To make the CodeAct agent more powerful with only access to `bash` action space, CodeAct agent leverages OpenDevin&#x27;s plugin system:
			
 
				+- [Jupyter plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/jupyter): for IPython execution via bash command
			
 
				+- [SWE-agent tool plugin](https://github.com/OpenDevin/OpenDevin/tree/main/opendevin/runtime/plugins/swe_agent_commands): Powerful bash command line tools for software development tasks introduced by [swe-agent](https://github.com/princeton-nlp/swe-agent).
			
 
				+
			
 
				+### Demo
			
 
				+
			
 
				+https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
			
 
				+
			
 
				+*Example of CodeActAgent with `gpt-4-turbo-2024-04-09` performing a data science task (linear regression)*
			
 
				+
			
 
				+
			
 
				+### Actions
			
 
				+
			
 
				+`Action`,
			
 
				+`CmdRunAction`,
			
 
				+`IPythonRunCellAction`,
			
 
				+`AgentEchoAction`,
			
 
				+`AgentFinishAction`,
			
 
				+`AgentTalkAction`
			
 
				+
			
 
				+### Observations
			
 
				+
			
 
				+`CmdOutputObservation`,
			
 
				+`IPythonRunCellObservation`,
			
 
				+`AgentMessageObservation`,
			
 
				+`UserMessageObservation`
			
 
				+
			
 
				+### Methods
			
 
				+
			
 
				+| Method          | Description                                                                                                                                                                                                                                             |
			
 
				+| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
			
 
				+| `__init__`      | Initializes an agent with `llm` and a list of messages `List[Mapping[str, str]]`                                                                                                                                                                        |
			
 
				+| `step`          | Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute. |
			
 
				+| `search_memory` | Not yet implemented                                                                                                                                                                                                                                     |
			
 
				+
			
 
				+### Work-in-progress &amp; Next step
			
 
				+
			
 
				+[] Support web-browsing
			
 
				+[] Complete the workflow for CodeAct agent to submit Github PRs
			
 
				+
			
 
				+
			
 
				 ## Monologue Agent
			
 
				 
			
 
				 ### Description
			
@@ -82,29 +140,3 @@ The agent is given its previous action-observation pairs, current task, and hint
 
				 | `__init__`      | Initializes an agent with `llm`                                                                                                                                                           |
			
 
				 | `step`          | Checks to see if current step is completed, returns `AgentFinishAction` if True. Otherwise, creates a plan prompt and sends to model for inference, adding the result as the next action. |
			
 
				 | `search_memory` | Not yet implemented                                                                                                                                                                       |
			
 
				-
			
 
				-## CodeAct Agent
			
 
				-
			
 
				-### Description
			
 
				-
			
 
				-The Code Act Agent is a minimalist agent. The agent works by passing the model a list of action-observation pairs and prompting the model to take the next step.
			
 
				-
			
 
				-### Actions
			
 
				-
			
 
				-`Action`,
			
 
				-`CmdRunAction`,
			
 
				-`AgentEchoAction`,
			
 
				-`AgentFinishAction`,
			
 
				-
			
 
				-### Observations
			
 
				-
			
 
				-`CmdOutputObservation`,
			
 
				-`AgentMessageObservation`,
			
 
				-
			
 
				-### Methods
			
 
				-
			
 
				-| Method          | Description                                                                                                                                                                                                                                             |
			
 
				-| --------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
			
 
				-| `__init__`      | Initializes an agent with `llm` and a list of messages `List[Mapping[str, str]]`                                                                                                                                                                        |
			
 
				-| `step`          | First, gets messages from state and then compiles them into a list for context. Next, pass the context list with the prompt to get the next command to execute. Finally, Execute command if valid, else return `AgentEchoAction(INVALID_INPUT_MESSAGE)` |
			
 
				-| `search_memory` | Not yet implemented                                                                                                                                                                                                                                     |