Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Clémentine
commited on
Commit
•
01d1bbb
1
Parent(s):
c177f62
text reorg
Browse files- content.py +2 -5
content.py
CHANGED
@@ -1,17 +1,14 @@
|
|
1 |
TITLE = """<h1 align="center" id="space-title">GAIA Leaderboard</h1>"""
|
2 |
|
3 |
-
CANARY_STRING = "" # TODO
|
4 |
-
|
5 |
INTRODUCTION_TEXT = """
|
6 |
-
GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc).
|
7 |
-
(See our paper for more details.)
|
8 |
|
9 |
## Context
|
10 |
GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. GAIA data can be found in this space (https://huggingface.co/datasets/gaia-benchmark/GAIA). Questions are contained in `metadata.jsonl`. Some questions come with an additional file, that can be found in the same folder and whose id is given in the field `file_name`.
|
11 |
|
12 |
It is divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities, each divided into a fully public dev set for validation, and a test set with private answers and metadata.
|
13 |
|
14 |
-
|
15 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|
16 |
|
17 |
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
|
|
|
1 |
TITLE = """<h1 align="center" id="space-title">GAIA Leaderboard</h1>"""
|
2 |
|
|
|
|
|
3 |
INTRODUCTION_TEXT = """
|
4 |
+
GAIA is a benchmark which aims at evaluating next-generation LLMs (LLMs with augmented capabilities due to added tooling, efficient prompting, access to search, etc). (See our paper for more details.)
|
|
|
5 |
|
6 |
## Context
|
7 |
GAIA is made of more than 450 non-trivial question with an unambiguous answer, requiring different levels of tooling and autonomy to solve. GAIA data can be found in this space (https://huggingface.co/datasets/gaia-benchmark/GAIA). Questions are contained in `metadata.jsonl`. Some questions come with an additional file, that can be found in the same folder and whose id is given in the field `file_name`.
|
8 |
|
9 |
It is divided in 3 levels, where level 1 should be breakable by very good LLMs, and level 3 indicate a strong jump in model capabilities, each divided into a fully public dev set for validation, and a test set with private answers and metadata.
|
10 |
|
11 |
+
# Submissions
|
12 |
Results can be submitted for both validation and test. Scores are expressed as the percentage of correct answers for a given split.
|
13 |
|
14 |
We expect submissions to be json-line files with the following format. The first two fields are mandatory, `reasoning_trace` is optionnal:
|