Răsfoiți Sursa

Evaluation README: Add TheAgentCompany (#5777)

Boxuan Li 1 an în urmă
părinte
comite
ecff5c67fb
1 a modificat fișierele cu 5 adăugiri și 1 ștergeri
  1. 5 1
      evaluation/README.md

+ 5 - 1
evaluation/README.md

@@ -42,7 +42,7 @@ temperature = 0.0
 
 
 ## Supported Benchmarks
 ## Supported Benchmarks
 
 
-The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), and [miscellaneous assistance](#misc-assistance) tasks.
+The OpenHands evaluation harness supports a wide variety of benchmarks across [software engineering](#software-engineering), [web browsing](#web-browsing), [miscellaneous assistance](#misc-assistance), and [real-world](#real-world) tasks.
 
 
 ### Software Engineering
 ### Software Engineering
 
 
@@ -73,6 +73,10 @@ The OpenHands evaluation harness supports a wide variety of benchmarks across [s
 - ProofWriter: [`evaluation/benchmarks/logic_reasoning`](./benchmarks/logic_reasoning)
 - ProofWriter: [`evaluation/benchmarks/logic_reasoning`](./benchmarks/logic_reasoning)
 - ScienceAgentBench: [`evaluation/benchmarks/scienceagentbench`](./benchmarks/scienceagentbench)
 - ScienceAgentBench: [`evaluation/benchmarks/scienceagentbench`](./benchmarks/scienceagentbench)
 
 
+### Real World
+
+- TheAgentCompany: [`evaluation/benchmarks/the_agent_company`](./benchmarks/the_agent_company)
+
 ## Result Visualization
 ## Result Visualization
 
 
 Check [this huggingface space](https://huggingface.co/spaces/OpenHands/evaluation) for visualization of existing experimental results.
 Check [this huggingface space](https://huggingface.co/spaces/OpenHands/evaluation) for visualization of existing experimental results.