Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Update content.py
Browse files- content.py +1 -1
content.py
CHANGED
@@ -10,7 +10,7 @@ GAIA is made of 3 evaluation levels, depending on the added level of tooling and
|
|
10 |
We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
|
11 |
Each of these levels is divided into two sets: a fully public dev set, on which people can test their models, and a test set with private answers and metadata. Results can be submitted for both validation and test.
|
12 |
|
13 |
-
We expect submissions to be json-line files with the following format:
|
14 |
```
|
15 |
{"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"}
|
16 |
{"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}
|
|
|
10 |
We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
|
11 |
Each of these levels is divided into two sets: a fully public dev set, on which people can test their models, and a test set with private answers and metadata. Results can be submitted for both validation and test.
|
12 |
|
13 |
+
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
|
14 |
```
|
15 |
{"task_id": "task_id_1", "model_answer": "Answer 1 from your model", "reasoning_trace": "The different steps by which your model reached answer 1"}
|
16 |
{"task_id": "task_id_2", "model_answer": "Answer 2 from your model", "reasoning_trace": "The different steps by which your model reached answer 2"}
|