Переглянути джерело

Update integration test instructions (#1645)

* Update README.md

* Update tests/integration/README.md

* Apply suggestions from code review

---------

Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Robert Brennan 1 рік тому
батько
коміт
09e8b11451
1 змінених файлів з 12 додано та 33 видалено
  1. 12 33
      tests/integration/README.md

+ 12 - 33
tests/integration/README.md

@@ -58,45 +58,24 @@ poetry run pytest -s ./tests/integration
 Note: in order to run integration tests correctly, please ensure your workspace is empty.
 
 
-## Write Integration Tests
-
-To write an integration test, there are essentially two steps:
-
-1. Decide your task prompt, and the result you want to verify.
-2. Either construct LLM responses by yourself, or run OpenDevin with a real LLM. The system prompts and
-LLM responses are recorded as logs, which you could then copy to test folder.
-The following paragraphs describe how to do it.
-
-Your `config.toml` should look like this:
-
-```toml
-LLM_MODEL="gpt-4-turbo"
-LLM_API_KEY="<your-api-key>"
-LLM_EMBEDDING_MODEL="openai"
-WORKSPACE_MOUNT_PATH="<absolute-path-of-your-workspace>"
-```
-
-You can choose any model you'd like to generate the mock responses.
-You can even handcraft mock responses, especially when you would like to test the behaviour of agent for corner cases. If you use a very weak model (e.g. 8B params), chance is most agents won't be able to finish the task.
-
+## Regenerate Integration Tests
+When you make changes to an agent's prompt, the integration tests will fail. You'll need to regenerate them
+by running:
 ```bash
-# Remove logs if you are okay to lose logs. This helps us locate the prompts and responses quickly, but is NOT a must.
-rm -rf logs
-# Clear the workspace, otherwise OpenDevin might not be able to reproduce your prompts in CI environment. Feel free to change the workspace name and path. Be sure to set `WORKSPACE_MOUNT_PATH` to the same absolute path.
-rm -rf workspace
-mkdir workspace
-# Depending on the complexity of the task you want to test, you can change the number of iterations limit. Change agent accordingly. If you are adding a new test, try generating mock responses for every agent.
-poetry run python ./opendevin/core/main.py -i 10 -t "Write a shell script 'hello.sh' that prints 'hello'." -c "MonologueAgent" -d "./workspace"
+./tests/integration/regenerate.sh
 ```
+Note that this will make several calls to your LLM_MODEL, potentially costing money! If you don't want
+to cover the cost, ask one of the maintainers to regenerate for you.
+You might also be able to fix the tests by hand.
 
-**NOTE**: If your agent decide to support user-agent interaction via natural language (e.g., you will prompted to enter user resposes when running the above `main.py` command), you should create a file named `tests/integration/mock/<AgentName>/<TestName>/user_responses.log` containing all the responses in order you provided to the agent, delimited by newline ('\n'). This will be used to mock the STDIN during testing.
+## Write a new Integration Test
 
-After running the above commands, you should be able to locate the real prompts
-and responses logged. The log folder follows `logs/llm/%y-%m-%d_%H-%M.log` format.
+To write an integration test, there are essentially two steps:
 
-Now, move all files under that folder to `tests/integration/mock/<AgentName>/<TestName>` folder. For example, moving all files from `logs/llm/24-04-23_21-55/` folder to
-`tests/integration/mock/MonologueAgent/test_write_simple_script` folder.
+1. Decide your task prompt, and the result you want to verify.
+2. Add your prompt to ./regenerate.sh
 
+**NOTE**: If your agent decide to support user-agent interaction via natural language (e.g., you will prompted to enter user resposes when running the above `main.py` command), you should create a file named `tests/integration/mock/<AgentName>/<TestName>/user_responses.log` containing all the responses in order you provided to the agent, delimited by newline ('\n'). This will be used to mock the STDIN during testing.
 
 That's it, you are good to go! When you launch an integration test, mock
 responses are loaded and used to replace a real LLM, so that we get