Spaces:
Running
on
CPU Upgrade
Running
on
CPU Upgrade
Commit
β’
d91eb24
1
Parent(s):
d6f0767
Update src/display/about.py (#19)
Browse files- Update src/display/about.py (5e1c6f0b3cfdf8076d670973a61121c38c075bfd)
Co-authored-by: Giwon Hong <GWHed@users.noreply.huggingface.co>
- src/display/about.py +4 -4
src/display/about.py
CHANGED
@@ -5,7 +5,7 @@ TITLE = """<h1 align="center" id="space-title">Hallucinations Leaderboard</h1>""
|
|
5 |
INTRODUCTION_TEXT = """
|
6 |
π The Hallucinations Leaderboard aims to track, rank and evaluate hallucinations in LLMs.
|
7 |
|
8 |
-
It evaluates the propensity for hallucination in Large Language Models (LLMs) across a diverse array of tasks, including Closed-book Open-domain QA, Summarization, Reading Comprehension, Instruction Following, Fact-Checking, Hallucination Detection
|
9 |
|
10 |
A more detailed explanation of the definition of hallucination and the leaderboard's motivation, tasks and dataset can be found on the "About" page and [The Hallucinations Leaderboard blog post](https://huggingface.co/blog/leaderboards-on-the-hub-hallucinations).
|
11 |
|
@@ -15,12 +15,12 @@ Metrics and datasets used by the Hallucinations Leaderboard were identified whil
|
|
15 |
If you have comments or suggestions on datasets and metrics, please [reach out to us in our discussion forum](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/discussions).
|
16 |
|
17 |
The Hallucination Leaderboard includes a variety of tasks identified while working on the [awesome-hallucination-detection](https://github.com/EdinburghNLP/awesome-hallucination-detection) repository:
|
18 |
-
- **Closed-book Open-domain QA** -- [NQ Open](https://huggingface.co/datasets/nq_open) (8-shot and 64-shot), [TriviaQA](https://huggingface.co/datasets/trivia_qa) (8-shot and 64-shot), [TruthfulQA](https://huggingface.co/datasets/truthful_qa) ([MC1](https://huggingface.co/datasets/truthful_qa/viewer/multiple_choice), [MC2](https://huggingface.co/datasets/truthful_qa/viewer/multiple_choice), and [Generative](https://huggingface.co/datasets/truthful_qa/viewer/generation))
|
19 |
- **Summarisation** -- [XSum](https://huggingface.co/datasets/EdinburghNLP/xsum), [CNN/DM](https://huggingface.co/datasets/cnn_dailymail)
|
20 |
-
- **Reading Comprehension** -- [RACE](https://huggingface.co/datasets/EleutherAI/race)
|
21 |
- **Instruction Following** -- [MemoTrap](https://huggingface.co/datasets/pminervini/inverse-scaling/viewer/memo-trap), [IFEval](https://huggingface.co/datasets/wis-k/instruction-following-eval)
|
22 |
- **Hallucination Detection** -- [FaithDial](https://huggingface.co/datasets/McGill-NLP/FaithDial), [True-False](https://huggingface.co/datasets/pminervini/true-false), [HaluEval](https://huggingface.co/datasets/pminervini/HaluEval) ([QA](https://huggingface.co/datasets/pminervini/HaluEval/viewer/qa_samples), [Summarisation](https://huggingface.co/datasets/pminervini/HaluEval/viewer/summarization_samples), and [Dialogue](https://huggingface.co/datasets/pminervini/HaluEval/viewer/dialogue_samples))
|
23 |
-
- **
|
24 |
|
25 |
For more information about the leaderboard, check our [HuggingFace Blog article](https://huggingface.co/blog/leaderboards-on-the-hub-hallucinations).
|
26 |
"""
|
|
|
5 |
INTRODUCTION_TEXT = """
|
6 |
π The Hallucinations Leaderboard aims to track, rank and evaluate hallucinations in LLMs.
|
7 |
|
8 |
+
It evaluates the propensity for hallucination in Large Language Models (LLMs) across a diverse array of tasks, including Closed-book Open-domain QA, Summarization, Reading Comprehension, Instruction Following, Fact-Checking, and Hallucination Detection. The evaluation encompasses a wide range of datasets such as NQ Open, TriviaQA, TruthfulQA, XSum, CNN/DM, RACE, SQuADv2, MemoTrap, IFEval, FEVER, FaithDial, True-False, HaluEval, NQ-Swap, and PopQA, offering a comprehensive assessment of each model's performance in generating accurate and contextually relevant content.
|
9 |
|
10 |
A more detailed explanation of the definition of hallucination and the leaderboard's motivation, tasks and dataset can be found on the "About" page and [The Hallucinations Leaderboard blog post](https://huggingface.co/blog/leaderboards-on-the-hub-hallucinations).
|
11 |
|
|
|
15 |
If you have comments or suggestions on datasets and metrics, please [reach out to us in our discussion forum](https://huggingface.co/spaces/hallucinations-leaderboard/leaderboard/discussions).
|
16 |
|
17 |
The Hallucination Leaderboard includes a variety of tasks identified while working on the [awesome-hallucination-detection](https://github.com/EdinburghNLP/awesome-hallucination-detection) repository:
|
18 |
+
- **Closed-book Open-domain QA** -- [NQ Open](https://huggingface.co/datasets/nq_open) (8-shot and 64-shot), [TriviaQA](https://huggingface.co/datasets/trivia_qa) (8-shot and 64-shot), [TruthfulQA](https://huggingface.co/datasets/truthful_qa) ([MC1](https://huggingface.co/datasets/truthful_qa/viewer/multiple_choice), [MC2](https://huggingface.co/datasets/truthful_qa/viewer/multiple_choice), and [Generative](https://huggingface.co/datasets/truthful_qa/viewer/generation)), [PopQA](https://huggingface.co/datasets/akariasai/PopQA)
|
19 |
- **Summarisation** -- [XSum](https://huggingface.co/datasets/EdinburghNLP/xsum), [CNN/DM](https://huggingface.co/datasets/cnn_dailymail)
|
20 |
+
- **Reading Comprehension** -- [RACE](https://huggingface.co/datasets/EleutherAI/race), [SQuADv2](https://huggingface.co/datasets/rajpurkar/squad_v2), [NQ-Swap](https://huggingface.co/datasets/pminervini/NQ-Swap)
|
21 |
- **Instruction Following** -- [MemoTrap](https://huggingface.co/datasets/pminervini/inverse-scaling/viewer/memo-trap), [IFEval](https://huggingface.co/datasets/wis-k/instruction-following-eval)
|
22 |
- **Hallucination Detection** -- [FaithDial](https://huggingface.co/datasets/McGill-NLP/FaithDial), [True-False](https://huggingface.co/datasets/pminervini/true-false), [HaluEval](https://huggingface.co/datasets/pminervini/HaluEval) ([QA](https://huggingface.co/datasets/pminervini/HaluEval/viewer/qa_samples), [Summarisation](https://huggingface.co/datasets/pminervini/HaluEval/viewer/summarization_samples), and [Dialogue](https://huggingface.co/datasets/pminervini/HaluEval/viewer/dialogue_samples))
|
23 |
+
- **Fact-Checking -- [FEVER](https://huggingface.co/datasets/pminervini/hl-fever)
|
24 |
|
25 |
For more information about the leaderboard, check our [HuggingFace Blog article](https://huggingface.co/blog/leaderboards-on-the-hub-hallucinations).
|
26 |
"""
|