[Hotfix] Fix ML-Bench continue ``run_inference.py`` (#2284)
* add ml-bench w/o exec env
* fix typos (#1956)
no functional change
* Refactored Logs (#1939)
* [Feat] A competitive Web Browsing agent (#1856)
* initial attempt at a browsing only agent
* add browsing agent
* update
* implement agent
* update
* fix comments
* remove unnecessary things from memory extras
* update image processing
---------
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
* Update README.md SWE-bench score (#1959)
* Update README.md SWE-bench score
Our most recent results on swe-bench lite are 25%, so this updates the README accordingly.
* Update
* fix: llm is_local function logic error (#1961)
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
* doc: update documentation about poetry update (#1962)
* add doc
* Update Development.md
---------
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* feat: add metrics related to cost for better observability (#1944)
* add metrics for total_cost
* make lint
* refact codeact
* change metrics into llm
* add costs list, add into state
* refactor log completion
* refactor and test others
* make lint
* Update opendevin/core/metrics.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update opendevin/llm/llm.py
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
* refactor
* add code
---------
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
* doc: add more cmd in unit test documentation (#1963)
* --- (#1975)
updated-dependencies:
- dependency-name: boto3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1976)
updated-dependencies:
- dependency-name: litellm
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Logging security (#1943)
* update .gitignore
* Rename the confusing 'INFO' style to 'DETAIL'
* override str and repr
* feat: api_key desensitize
* feat: add SensitiveDataFilter in file handler
* tweak regex, add tests
* more tweaks, include other attrs
* add env vars, those with equivalent config
* fix tests
* tests are invaluable
---------
Co-authored-by: Shimada666 <649940882@qq.com>
* --- (#1967)
updated-dependencies:
- dependency-name: react-dom
dependency-type: direct:production
update-type: version-update:semver-minor
- dependency-name: "@types/react-dom"
dependency-type: direct:development
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1968)
updated-dependencies:
- dependency-name: "@reduxjs/toolkit"
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1969)
updated-dependencies:
- dependency-name: husky
dependency-type: direct:development
update-type: version-update:semver-major
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1970)
updated-dependencies:
- dependency-name: tailwind-merge
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* --- (#1971)
updated-dependencies:
- dependency-name: i18next
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
* Refactor session management (#1810)
* refactor session mgmt
* defer file handling to runtime
* add todo
* refactor sessions a bit more
* remove messages logic from FE
* fix up socket handshake
* refactor frontend auth a bit
* first pass at redoing file explorer
* implement directory suffix
* fix up file tree
* close agent on websocket close
* remove session saving
* move file refresh
* remove getWorkspace
* plumb path/code differently
* fix build issues
* fix the tests
* fix npm build
* add session rehydration
* fix event serialization
* logspam
* fix user message rehydration
* add get_event fn
* agent state restoration
* change history tracking for codeact
* fix responsiveness of init
* fix lint
* lint
* delint
* fix prop
* update tests
* logspam
* lint
* fix test
* revert codeact
* change fileService to use API
* fix up session loading
* delint
* delint
* fix integration tests
* revert test
* fix up access to options endpoints
* fix initial files load
* delint
* fix file initialization
* fix mock server
* fixl int
* fix auth for html
* Update frontend/src/i18n/translation.json
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
* refactor sessions and sockets
* avoid reinitializing the same session
* fix reconnect issue
* change up intro message
* more guards on reinit
* rename agent_session
* delint
* fix a bunch of tests
* delint
* fix last test
* remove code editor context
* fix build
* fix any
* fix dot notation
* Update frontend/src/services/api.ts
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* fix up error handling
* Update opendevin/server/session/agent.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update opendevin/server/session/agent.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update frontend/src/services/session.ts
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* fix build errs
* fix else
* add closed state
* delint
* Update opendevin/server/session/session.py
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
---------
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* fix #1960 (#1964)
* Add ruff for shared mutable defaults (B) (#1938)
* Add ruff for shared mutable defaults (B)
* Apply B006, B008 on current files, except fast API
* Update agenthub/SWE_agent/prompts.py
Co-authored-by: Graham Neubig <neubig@gmail.com>
* fix unintended behavior change
* this is correct, tell Ruff to leave it alone
---------
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Refactor integration testing CI, add optional Mac tests, and mark a few agents as deprecated (#1888)
* Add MacOS to integration tests
* Switch back to python 3.11
* Install Docker for macos pipeline
* regenerate.sh: Use environmental variable for sandbox type
* Pack different agents' tests into a single check
* Fix CodeAct tests
* Reduce file match and extensive debug logs
* Add TEST_IN_CI mode that reports codecov
* Small fix: don't quit if reusing old responses failed
* Merge codecov results
* Fix typos
* Remove coverage merge step - codecov automatically does that
* Make mac integration tests as optional - too slow
* Fix codecov args
* Add comments in yaml
* Include sandbox type in codecov report name
* Fix codecov report merge
* Revert renaming of test_matrix_success
* Remove SWEAgent and PlannerAgent from tests
* Mark planner agent and SWE agent as deprecated
* CodeCov: Ignore planner and sweagent
* Revert "Remove SWEAgent and PlannerAgent from tests"
This reverts commit 040cb3bfb9496090d4516f7b45f38376f95de7be.
* Remove all tests for SWE Agent
* Only keep basic tests for MonologueAgent and PlannerAgent
* Mark SWE Agent as deprecated, and ignore code coverage for it
---------
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
* Fix Repeated Responses in Chat by Adding IPythonRunCellObservation (#1987)
Co-authored-by: jianghongwei <jianghongwei@58.com>
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
* Save CI cycles for backend tests (#1985)
* Fix typo in prompt (#1992)
* Refactor monologue and SWE agent to use the messages in state history (#1863)
* Refactor monologue to use the messages in state history
* add messages, clean up
* fix monologue
* update integration tests
* move private method
* update SWE agent to use the history from State
* integration tests for SWE agent
* rename monologue to initial_thoughts, since that is what it is
* fix: catch session file not existed exception when init EventStream(maybe creating a new session with no session files stored). (#1994)
* add ml-bench in readme
* Bump boto3 from 1.34.110 to 1.34.111 (#2001)
Bumps [boto3](https://github.com/boto/boto3) from 1.34.110 to 1.34.111.
- [Release notes](https://github.com/boto/boto3/releases)
- [Changelog](https://github.com/boto/boto3/blob/develop/CHANGELOG.rst)
- [Commits](https://github.com/boto/boto3/compare/1.34.110...1.34.111)
---
updated-dependencies:
- dependency-name: boto3
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump docker from 7.0.0 to 7.1.0 (#2002)
Bumps [docker](https://github.com/docker/docker-py) from 7.0.0 to 7.1.0.
- [Release notes](https://github.com/docker/docker-py/releases)
- [Commits](https://github.com/docker/docker-py/compare/7.0.0...7.1.0)
---
updated-dependencies:
- dependency-name: docker
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump litellm from 1.37.20 to 1.38.0 (#2005)
Bumps [litellm](https://github.com/BerriAI/litellm) from 1.37.20 to 1.38.0.
- [Release notes](https://github.com/BerriAI/litellm/releases)
- [Commits](https://github.com/BerriAI/litellm/compare/v1.37.20...v1.38.0)
---
updated-dependencies:
- dependency-name: litellm
dependency-type: direct:production
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix SWE-Bench evaluation due to setuptools version (#1995)
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* Revert "correctly setup plugins for swebench eval"
This reverts commit 2bd10556739e2af602ea85371b976390f7c48077.
* bump version
* fix session state after resuming (#1999)
* fix state resuming
* fix session reconnection
* fix lint
* Implement `agentskills` for OpenDevin to helpfully improve edit AND including more useful tools/skills (#1941)
* add draft for skills
* Implement and test agentskills functions: open_file, goto_line, scroll_down, scroll_up, create_file, search_dir, search_file, find_file
* Remove new_sample.txt file
* add some work from opendevin w/ fixes
* Add unit tests for agentskills module
* fix some issues and updated tests
* add more tests for open
* tweak and handle goto_line
* add tests for some edge cases
* add tests for scrolling
* add tests for edit
* add tests for search_dir
* update tests to use pytest
* use pytest --forked to avoid file op unit tests to interfere with each other via global var
* update doc based on swe agent tool
* update and add tests for find_file and search_file
* move agent_skills to plugins
* add agentskills as plugin and docs
* add agentskill to ssh box and fix sandbox integration
* remove extra returns in doc
* add agentskills to initial tool for jupyter
* support re-init jupyter kernel (for agentskills) after restart
* fix print window's issue with indentation and add testcases
* add prompt for codeact with the newest edit primitives
* modify the way line number is presented (remove leading space)
* change prompt to the newest display format
* support tracking of costs via metrics
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* implement and add tests for py linting
* remove extra text arg for incompatible subprocess ver
* remove sample.txt
* update test_edits integration tests
* fix all integration
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update opendevin/runtime/plugins/agent_skills/README.md
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update agenthub/codeact_agent/prompt.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* Update opendevin/runtime/plugins/agent_skills/agentskills.py
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* correctly setup plugins for swebench eval
* bump swe-bench version and add logging
* Revert "correctly setup plugins for swebench eval"
This reverts commit 2bd10556739e2af602ea85371b976390f7c48077.
* bump version
* remove _AGENT_SKILLS_DOCS
* move flake8 to test dep
* update poetry.lock
* remove extra arg
* reduce max iter for eval
* update poetry
* fix integration tests
---------
Co-authored-by: OpenDevin <opendevin@opendevin.ai>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
* build: Add poetry command to use Python 3.11 for environment setup (#1972)
* Bump @react-types/shared from 3.23.0 to 3.23.1 in /frontend (#2006)
Bumps [@react-types/shared](https://github.com/adobe/react-spectrum) from 3.23.0 to 3.23.1.
- [Release notes](https://github.com/adobe/react-spectrum/releases)
- [Commits](https://github.com/adobe/react-spectrum/compare/@react-types/shared@3.23.0...@react-types/shared@3.23.1)
---
updated-dependencies:
- dependency-name: "@react-types/shared"
dependency-type: direct:production
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump @types/react-syntax-highlighter in /frontend (#2007)
Bumps [@types/react-syntax-highlighter](https://github.com/DefinitelyTyped/DefinitelyTyped/tree/HEAD/types/react-syntax-highlighter) from 15.5.11 to 15.5.13.
- [Release notes](https://github.com/DefinitelyTyped/DefinitelyTyped/releases)
- [Commits](https://github.com/DefinitelyTyped/DefinitelyTyped/commits/HEAD/types/react-syntax-highlighter)
---
updated-dependencies:
- dependency-name: "@types/react-syntax-highlighter"
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump @typescript-eslint/parser from 7.9.0 to 7.10.0 in /frontend (#2008)
Bumps [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser) from 7.9.0 to 7.10.0.
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v7.10.0/packages/parser)
---
updated-dependencies:
- dependency-name: "@typescript-eslint/parser"
dependency-type: direct:development
update-type: version-update:semver-minor
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Bump lint-staged from 15.2.2 to 15.2.4 in /frontend (#2009)
Bumps [lint-staged](https://github.com/okonet/lint-staged) from 15.2.2 to 15.2.4.
- [Release notes](https://github.com/okonet/lint-staged/releases)
- [Changelog](https://github.com/lint-staged/lint-staged/blob/master/CHANGELOG.md)
- [Commits](https://github.com/okonet/lint-staged/compare/v15.2.2...v15.2.4)
---
updated-dependencies:
- dependency-name: lint-staged
dependency-type: direct:development
update-type: version-update:semver-patch
...
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Update README.md
* Update README.md
* add run_infer.sh
* fix input output
* fix docker sandbox
* fix run
* update and clean run_infer.py
* add script to clean up dockers
* update repo uid
* add description
* new
* Update README.md
* use root for sandbox
* update readme
* update ml-bench conda env
* update readme
* update readme
* use try except
* modify raise exception
* add int
* update README
* longer time
* fix existing issues
* fix existing issue
* new docker image
* add metrics of cost
* add result parsing cost
* fix
* fix
* update summarize
* fix
---------
Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Ubuntu <ubuntu@ip-172-31-31-157.ec2.internal>
Co-authored-by: RainRat <rainrat78@yahoo.ca>
Co-authored-by: மனோஜ்குமார் பழனிச்சாமி <smartmanoj42857@gmail.com>
Co-authored-by: Frank Xu <frankxu2004@gmail.com>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: Shimada666 <649940882@qq.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Robert Brennan <accounts@rbren.io>
Co-authored-by: Rahul Anand <62982824+zeul22@users.noreply.github.com>
Co-authored-by: jiangleo <jiangleo@users.noreply.github.com>
Co-authored-by: jianghongwei <jianghongwei@58.com>
Co-authored-by: Jeremi Joslin <jeremi@newlogic.com>
Co-authored-by: Aaron Xia <zhhuaxia@gmail.com>
Co-authored-by: OpenDevin <opendevin@opendevin.ai>
Co-authored-by: DaxServer <7479937+DaxServer@users.noreply.github.com>
Co-authored-by: Robert <871607149@qq.com>