Spaces:
Runtime error
Runtime error
Commit
•
a86b728
1
Parent(s):
5f9d165
Update content.py
Browse files- content.py +1 -2
content.py
CHANGED
@@ -4,8 +4,7 @@ CANARY_STRING = "" # TODO
|
|
4 |
|
5 |
INTRODUCTION_TEXT = """
|
6 |
Large language models have seen their potential capabilities increased by several orders of magnitude with the introduction of augmentations, from simple prompting adjustement to actual external tooling (calculators, vision models, ...) or online web retrieval.
|
7 |
-
To evaluate the next generation of LLMs, we argue for a new kind of benchmark, simple and yet effective to measure actual progress on augmented capabilities,
|
8 |
-
We therefore present GAIA.
|
9 |
|
10 |
GAIA is made of 3 evaluation levels, depending on the added level of tooling and autonomy the model needs.
|
11 |
We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
|
|
|
4 |
|
5 |
INTRODUCTION_TEXT = """
|
6 |
Large language models have seen their potential capabilities increased by several orders of magnitude with the introduction of augmentations, from simple prompting adjustement to actual external tooling (calculators, vision models, ...) or online web retrieval.
|
7 |
+
To evaluate the next generation of LLMs, we argue for a new kind of benchmark, simple and yet effective to measure actual progress on augmented capabilities, and therefore present GAIA. Details in the paper.
|
|
|
8 |
|
9 |
GAIA is made of 3 evaluation levels, depending on the added level of tooling and autonomy the model needs.
|
10 |
We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
|