This agent implements the CodeAct idea (paper, tweet) that consolidates LLM agents’ actions into a unified code action space for both simplicity and performance (see paper for more details).
The conceptual idea is illustrated below. At each turn, the agent can:
bash commandPython code with an interactive Python interpreter. This is simulated through bash command, see plugin system below for more details.To make the CodeAct agent more powerful with only access to bash action space, CodeAct agent leverages OpenDevin's plugin system:
https://github.com/OpenDevin/OpenDevin/assets/38853559/f592a192-e86c-4f48-ad31-d69282d5f6ac
Example of CodeActAgent with gpt-4-turbo-2024-04-09 performing a data science task (linear regression)
Action,
CmdRunAction,
IPythonRunCellAction,
AgentEchoAction,
AgentFinishAction,
AgentTalkAction
CmdOutputObservation,
IPythonRunCellObservation,
AgentMessageObservation,
UserMessageObservation
| Method | Description |
|---|---|
__init__ |
Initializes an agent with llm and a list of messages List[Mapping[str, str]] |
step |
Performs one step using the CodeAct Agent. This includes gathering info on previous steps and prompting the model to make a command to execute. |
search_memory |
Not yet implemented |
[] Support web-browsing [] Complete the workflow for CodeAct agent to submit Github PRs
The Monologue Agent utilizes long and short term memory to complete tasks. Long term memory is stored as a LongTermMemory object and the model uses it to search for examples from the past. Short term memory is stored as a Monologue object and the model can condense it as necessary.
Action,
NullAction,
CmdRunAction,
FileWriteAction,
FileReadAction,
AgentRecallAction,
BrowseURLAction,
GithubPushAction,
AgentThinkAction
Observation,
NullObservation,
CmdOutputObservation,
FileReadObservation,
AgentRecallObservation,
BrowserOutputObservation
| Method | Description |
|---|---|
__init__ |
Initializes the agent with a long term memory, and an internal monologue |
_add_event |
Appends events to the monologue of the agent and condenses with summary automatically if the monologue is too long |
_initialize |
Utilizes the INITIAL_THOUGHTS list to give the agent a context for its capabilities and how to navigate the /workspace |
step |
Modifies the current state by adding the most recent actions and observations, then prompts the model to think about its next action to take. |
search_memory |
Uses VectorIndexRetriever to find related memories within the long term memory. |
The planner agent utilizes a special prompting strategy to create long term plans for solving problems. The agent is given its previous action-observation pairs, current task, and hint based on last action taken at every step.
NullAction,
CmdRunAction,
CmdKillAction,
BrowseURLAction,
GithubPushAction,
FileReadAction,
FileWriteAction,
AgentRecallAction,
AgentThinkAction,
AgentFinishAction,
AgentSummarizeAction,
AddTaskAction,
ModifyTaskAction,
Observation,
NullObservation,
CmdOutputObservation,
FileReadObservation,
AgentRecallObservation,
BrowserOutputObservation
| Method | Description |
|---|---|
__init__ |
Initializes an agent with llm |
step |
Checks to see if current step is completed, returns AgentFinishAction if True. Otherwise, creates a plan prompt and sends to model for inference, adding the result as the next action. |
search_memory |
Not yet implemented |