aryopg commited on
Commit
815e752
1 Parent(s): 3ff4cff

Only 3 columns can be displayed

Browse files
Files changed (1) hide show
  1. src/display/about.py +21 -21
src/display/about.py CHANGED
@@ -62,27 +62,27 @@ The total batch size we get for models which fit on one A100 node is 8 (8 GPUs *
62
 
63
  The tasks and few shots parameters are:
64
 
65
- | Name | Alias | k-shot | Metric |
66
- | --- | --- | --- | --- |
67
- | <a href="https://aclanthology.org/P19-1612/" target="_blank"> NQ Open </a> | `nq_open` | 64 | `exact_match` |
68
- | <a href="https://aclanthology.org/P19-1612/" target="_blank"> NQ Open 8 </a> | `nq8` | 8 | `exact_match` |
69
- | <a href="https://aclanthology.org/P17-1147/" target="_blank"> TriviaQA </a> | `triviaqa` | 64 | `exact_match` |
70
- | <a href="https://aclanthology.org/P17-1147/" target="_blank"> TriviaQA 8 </a> | `tqa8` | 8 | `exact_match` |
71
- | <a href="https://aclanthology.org/2022.acl-long.229/" target="_blank"> TruthfulQA MC1 </a> | `truthfulqa_mc1` | 0 | `acc` |
72
- | <a href="https://aclanthology.org/2022.acl-long.229/" target="_blank"> TruthfulQA MC2 </a> | `truthfulqa_mc2` | 0 | `acc` |
73
- | <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval QA </a> | `halueval_qa` | 0 | `em` |
74
- | <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval Summ </a> | `halueval_summarization` | 0 | `em` |
75
- | <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval Dial </a> | `halueval_dialogue` | 0 | `em` |
76
- | <a href="https://aclanthology.org/2020.acl-main.173/" target="_blank"> XSum </a> | `xsum` | 2 | `rougeLsum` |
77
- | <a href="https://arxiv.org/abs/1704.04368" target="_blank"> CNN/DM </a> | `cnndm` | 2 | `rougeLsum` |
78
- | <a href="https://github.com/inverse-scaling/prize/tree/main" target="_blank"> MemoTrap </a> | `trap` | 0 | `acc` |
79
- | <a href="https://arxiv.org/abs/2311.07911v1" target="_blank"> IFEval </a> | `ifeval` | 0 | `prompt_level_strict_acc` |
80
- | <a href="https://arxiv.org/abs/2303.08896" target="_blank"> SelfCheckGPT </a> | `selfcheckgpt` | 0 | - |
81
- | <a href="https://arxiv.org/abs/1803.05355" target="_blank"> FEVER </a> | `fever10` | 16 | `acc` |
82
- | <a href="https://aclanthology.org/D16-1264/" target="_blank"> SQuADv2 </a> | `squadv2` | 4 | `squad_v2` |
83
- | <a href="https://aclanthology.org/2023.findings-emnlp.68/" target="_blank"> TrueFalse </a> | `truefalse_cieacf` | 8 | `acc` |
84
- | <a href="https://aclanthology.org/2022.tacl-1.84/" target="_blank"> FaithDial </a> | `faithdial_hallu` | 8 | `acc` |
85
- | <a href="https://aclanthology.org/D17-1082/" target="_blank"> RACE </a> | `race` | 0 | `acc` |
86
 
87
  For all these evaluations, a higher score is a better score.
88
 
 
62
 
63
  The tasks and few shots parameters are:
64
 
65
+ | Name (Alias) | k-shot | Metric |
66
+ | --- | --- | --- |
67
+ | <a href="https://aclanthology.org/P19-1612/" target="_blank"> NQ Open </a> (`nq_open`) | 64 | `exact_match` |
68
+ | <a href="https://aclanthology.org/P19-1612/" target="_blank"> NQ Open 8 </a> (`nq8`) | 8 | `exact_match` |
69
+ | <a href="https://aclanthology.org/P17-1147/" target="_blank"> TriviaQA </a> (`triviaqa`) | 64 | `exact_match` |
70
+ | <a href="https://aclanthology.org/P17-1147/" target="_blank"> TriviaQA 8 </a> (`tqa8`) | 8 | `exact_match` |
71
+ | <a href="https://aclanthology.org/2022.acl-long.229/" target="_blank"> TruthfulQA MC1 </a> (`truthfulqa_mc1`) | 0 | `acc` |
72
+ | <a href="https://aclanthology.org/2022.acl-long.229/" target="_blank"> TruthfulQA MC2 </a> (`truthfulqa_mc2`) | 0 | `acc` |
73
+ | <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval QA </a> (`halueval_qa`) | 0 | `em` |
74
+ | <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval Summ </a> (`halueval_summarization`) | 0 | `em` |
75
+ | <a href="https://aclanthology.org/2023.emnlp-main.397/" target="_blank"> HaluEval Dial </a> (`halueval_dialogue`) | 0 | `em` |
76
+ | <a href="https://aclanthology.org/2020.acl-main.173/" target="_blank"> XSum </a> (`xsum`) | 2 | `rougeLsum` |
77
+ | <a href="https://arxiv.org/abs/1704.04368" target="_blank"> CNN/DM </a> (`cnndm`) | 2 | `rougeLsum` |
78
+ | <a href="https://github.com/inverse-scaling/prize/tree/main" target="_blank"> MemoTrap </a> (`trap`) | 0 | `acc` |
79
+ | <a href="https://arxiv.org/abs/2311.07911v1" target="_blank"> IFEval </a> (`ifeval`) | 0 | `prompt_level_strict_acc` |
80
+ | <a href="https://arxiv.org/abs/2303.08896" target="_blank"> SelfCheckGPT </a> (`selfcheckgpt`) | 0 | - |
81
+ | <a href="https://arxiv.org/abs/1803.05355" target="_blank"> FEVER </a> (`fever10`) | 16 | `acc` |
82
+ | <a href="https://aclanthology.org/D16-1264/" target="_blank"> SQuADv2 </a> (`squadv2`) | 4 | `squad_v2` |
83
+ | <a href="https://aclanthology.org/2023.findings-emnlp.68/" target="_blank"> TrueFalse </a> (`truefalse_cieacf`) | 8 | `acc` |
84
+ | <a href="https://aclanthology.org/2022.tacl-1.84/" target="_blank"> FaithDial </a> (`faithdial_hallu`) | 8 | `acc` |
85
+ | <a href="https://aclanthology.org/D17-1082/" target="_blank"> RACE </a> (`race`) | 0 | `acc` |
86
 
87
  For all these evaluations, a higher score is a better score.
88