Spaces:

gaia-benchmark
/

leaderboard

Runtime error

clefourrier HF staff commited on Nov 3, 2023

Commit

a86b728

•

1 Parent(s): 5f9d165

Update content.py

Files changed (1) hide show

content.py CHANGED Viewed

@@ -4,8 +4,7 @@ CANARY_STRING = "" # TODO
 INTRODUCTION_TEXT = """
 Large language models have seen their potential capabilities increased by several orders of magnitude with the introduction of augmentations, from simple prompting adjustement to actual external tooling (calculators, vision models, ...) or online web retrieval.
-To evaluate the next generation of LLMs, we argue for a new kind of benchmark, simple and yet effective to measure actual progress on augmented capabilities,
-We therefore present GAIA.
 GAIA is made of 3 evaluation levels, depending on the added level of tooling and autonomy the model needs.
 We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.

 INTRODUCTION_TEXT = """
 Large language models have seen their potential capabilities increased by several orders of magnitude with the introduction of augmentations, from simple prompting adjustement to actual external tooling (calculators, vision models, ...) or online web retrieval.
+To evaluate the next generation of LLMs, we argue for a new kind of benchmark, simple and yet effective to measure actual progress on augmented capabilities, and therefore present GAIA. Details in the paper.
 GAIA is made of 3 evaluation levels, depending on the added level of tooling and autonomy the model needs.
 We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.