open_dutch_llm_leaderboard

Running

File size: 2,092 Bytes

5693ee5
f067bfb
 
 
 
5693ee5
 
f067bfb
 
 
 
 
 
5693ee5
f067bfb
828458d
f067bfb
 
90fafdc
 
 
5693ee5
90fafdc
 
13a280b
90fafdc
5693ee5
90fafdc
 
 
 
f067bfb
 
 
 
5693ee5
 
 
 
 
 
 
 
f067bfb
 
 
 
 
 
5693ee5

TITLE = '<h1 align="center" id="space-title">Open Multilingual LLM Evaluation Leaderboard (Dutch only)</h1>'

INTRO_TEXT = f"""
## About

This is a fork of the [Open Multilingual LLM Evaluation Leaderboard](https://huggingface.co/spaces/uonlp/open_multilingual_llm_leaderboard), but restricted to only Dutch models and augmented with additional model results.
We test the models on the following benchmarks **for the Dutch version only!!**, which have been translated into Dutch automatically by the original authors of the Open Multilingual LLM Evaluation Leaderboard with `gpt-35-turbo`.

- <a href="https://arxiv.org/abs/1803.05457" target="_blank">  AI2 Reasoning Challenge </a> (25-shot) 
- <a href="https://arxiv.org/abs/1905.07830" target="_blank">  HellaSwag </a> (10-shot) 
- <a href="https://arxiv.org/abs/2009.03300" target="_blank">  MMLU </a>  (5-shot) 
- <a href="https://arxiv.org/abs/2109.07958" target="_blank">  TruthfulQA </a> (0-shot)

I do not maintain those datasets, I only run benchmarks and add the results to this space. For questions regarding the test sets or running them yourself, see [the original Github repository](https://github.com/laiviet/lm-evaluation-harness).

All models are benchmarked in 8-bit precision.
"""

CREDIT = f"""
## Credit

This leaderboard has borrowed heavily from the following sources:

- Datasets (AI2_ARC, HellaSwag, MMLU, TruthfulQA)
- Evaluation code (EleutherAI's lm_evaluation_harness repo)
- Leaderboard code (Huggingface4's open_llm_leaderboard repo)
- The multilingual version of the leaderboard (uonlp's open_multilingual_llm_leaderboard repo)

"""


CITATION = f"""
## Citation


If you use or cite the Dutch benchmark results or this specific leaderboard page, please cite the following paper:

TDB


If you use the multilingual benchmarks, please cite the following paper:

```bibtex
@misc{{lai2023openllmbenchmark,
    author = {{Viet Lai and Nghia Trung Ngo and Amir Pouran Ben Veyseh and Franck Dernoncourt and Thien Huu Nguyen}},
    title={{Open Multilingual LLM Evaluation Leaderboard}},
    year={{2023}}
}}
```
"""