Spaces:

gaia-benchmark
/

leaderboard

Running on CPU Upgrade

gregmialz commited on Nov 13, 2023

Commit

2937740

1 Parent(s): 58e4674

Update content.py

Files changed (1) hide show

content.py CHANGED Viewed

@@ -10,7 +10,7 @@ GAIA is made of 3 evaluation levels, depending on the added level of tooling and
 We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
 Each of these levels is divided into two sets: a fully public dev set, on which people can test their models, and a test set with private answers and metadata. Results can be submitted for both validation and test.
-We expect submissions to be json-line files with the following format:
 ```
 {"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"}
 {"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}

 We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
 Each of these levels is divided into two sets: a fully public dev set, on which people can test their models, and a test set with private answers and metadata. Results can be submitted for both validation and test.
+We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
 ```
 {"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"}
 {"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}