Browse Source

update README for GAIA (#2054)

* update README for GAIA

* Update evaluation/gaia/README.md

* Update evaluation/gaia/README.md

* Update evaluation/gaia/README.md

---------

Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Xingyao Wang 1 năm trước cách đây
mục cha
commit
28ab00946b
1 tập tin đã thay đổi với 5 bổ sung7 xóa
  1. 5 7
      evaluation/gaia/README.md

+ 5 - 7
evaluation/gaia/README.md

@@ -19,17 +19,15 @@ Following is the basic command to start the evaluation. Here we are evaluating o
 
 where `model_config` is mandatory, while `agent`, `eval_limit` and `gaia_subset` are optional.
 
-`model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for your
-LLM settings, as defined in your `config.toml`.
+- `model_config`, e.g. `eval_gpt4_1106_preview`, is the config group name for your
+LLM settings, as defined in your `config.toml`, defaulting to `gpt-3.5-turbo`
 
-`agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks, defaulting
+- `agent`, e.g. `CodeActAgent`, is the name of the agent for benchmarks, defaulting
 to `CodeActAgent`.
 
-`eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit` instances. By
-default, the script evaluates the entire SWE-bench_Lite test set (300 issues). Note:
-in order to use `eval_limit`, you must also set `agent`.
+- `eval_limit`, e.g. `10`, limits the evaluation to the first `eval_limit` instances, defaulting to all instances.
 
-`gaia_subset`, GAIA benchmark has multiple subsets: `2023_level1`, `2023_level2`, `2023_level3`, `2023_all`. If not provided, it will defaults to `2023_level1`.
+- `gaia_subset`, GAIA benchmark has multiple subsets: `2023_level1`, `2023_level2`, `2023_level3`, `2023_all`, defaulting to `2023_level1`.
 
 Let's say you'd like to run 10 instances using `eval_gpt4_1106_preview` and CodeActAgent,
 then your command would be: