clefourrier HF staff commited on
Commit
a86b728
1 Parent(s): 5f9d165

Update content.py

Browse files
Files changed (1) hide show
  1. content.py +1 -2
content.py CHANGED
@@ -4,8 +4,7 @@ CANARY_STRING = "" # TODO
4
 
5
  INTRODUCTION_TEXT = """
6
  Large language models have seen their potential capabilities increased by several orders of magnitude with the introduction of augmentations, from simple prompting adjustement to actual external tooling (calculators, vision models, ...) or online web retrieval.
7
- To evaluate the next generation of LLMs, we argue for a new kind of benchmark, simple and yet effective to measure actual progress on augmented capabilities,
8
- We therefore present GAIA.
9
 
10
  GAIA is made of 3 evaluation levels, depending on the added level of tooling and autonomy the model needs.
11
  We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.
 
4
 
5
  INTRODUCTION_TEXT = """
6
  Large language models have seen their potential capabilities increased by several orders of magnitude with the introduction of augmentations, from simple prompting adjustement to actual external tooling (calculators, vision models, ...) or online web retrieval.
7
+ To evaluate the next generation of LLMs, we argue for a new kind of benchmark, simple and yet effective to measure actual progress on augmented capabilities, and therefore present GAIA. Details in the paper.
 
8
 
9
  GAIA is made of 3 evaluation levels, depending on the added level of tooling and autonomy the model needs.
10
  We expect the level 1 to be breakable by very good LLMs, and the level 3 to indicate a strong jump in model capabilities.