AmeyaPrabhu commited on
Commit
e068ce3
1 Parent(s): 9852685

Update contamination_report.csv

Browse files

## What are you reporting:
- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)

**Evaluation dataset(s)**:
allenai/openbookqa
Anagrams 1
Anagrams 2
cimec/lambada
csebuetnlp/xlsum
Cycled Letters
EdinburghNLP/xsum
facebook/anli
ibragim-bad/arc_challenge
ibragim-bad/arc_easy
mandarjoshi/trivia_qa
natural_questions
piqa
quac
race
rajpurkar/squad_v2
Reversed Words
rmanluo/RoG-webqsp
Rowan/hellaswag
SAT Analogies
stanfordnlp/coqa
story_cloze
super_glue
Symbol Insertion
ucinlp/drop
wiki_lingua
winograd_wsc
winogrande
wmt/wmt16


**Contaminated model(s)**:
FLAN, GLaM, GPT-3, PaLM, PaLM 2

## Briefly describe your method to detect data contamination

- [x] Data-based approach
- [ ] Model-based approach

Description of your method, 3-4 sentences. Evidence of data contamination (Read below):

Exact string matching: GPT-3, FLAN and GLaM use 13−gram overlaps, PaLM and PaLM 2 use 15-gram overlaps

**Papers found using**: https://hitz-zentroa.github.io/lm-contamination/

## Citation

Is there a paper that reports the data contamination or describes the method used to detect data contamination?

URL: `[https://hitz-zentroa.github.io/lm-contamination/](https://hitz-zentroa.github.io/lm-contamination/)`
Citation: ```GPT3:


@article

{brown2020language,
title={Language models are few-shot learners},
author={Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and others},
journal={Advances in neural information processing systems},
volume={33},
pages={1877--1901},
year={2020}
}

FLAN:


@article

{wei2021finetuned,
title={Finetuned language models are zero-shot learners},
author={Wei, Jason and Bosma, Maarten and Zhao, Vincent Y and Guu, Kelvin and Yu, Adams Wei and Lester, Brian and Du, Nan and Dai, Andrew M and Le, Quoc V},
journal={arXiv preprint arXiv:2109.01652},
year={2021}
}

GLaM:
@inproceedings{du2022glam,
title={Glam: Efficient scaling of language models with mixture-of-experts},
author={Du, Nan and Huang, Yanping and Dai, Andrew M and Tong, Simon and Lepikhin, Dmitry and Xu, Yuanzhong and Krikun, Maxim and Zhou, Yanqi and Yu, Adams Wei and Firat, Orhan and others},
booktitle={International Conference on Machine Learning},
pages={5547--5569},
year={2022},
organization={PMLR}
}

PaLM:


@article

{chowdhery2023palm,
title={Palm: Scaling language modeling with pathways},
author={Chowdhery, Aakanksha and Narang, Sharan and Devlin, Jacob and Bosma, Maarten and Mishra, Gaurav and Roberts, Adam and Barham, Paul and Chung, Hyung Won and Sutton, Charles and Gehrmann, Sebastian and others},
journal={Journal of Machine Learning Research},
volume={24},
number={240},
pages={1--113},
year={2023}
}

PaLM 2:


@article

{anil2023palm,
title={Palm 2 technical report},
author={Anil, Rohan and Dai, Andrew M and Firat, Orhan and Johnson, Melvin and Lepikhin, Dmitry and Passos, Alexandre and Shakeri, Siamak and Taropa, Emanuel and Bailey, Paige and Chen, Zhifeng and others},
journal={arXiv preprint arXiv:2305.10403},
year={2023}
}
```

*Important!* If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
- Full name: Ameya Prabhu
- Institution: Tübingen AI Center, University of Tübingen
- Email: ameya@prabhu.be

Files changed (1) hide show
  1. contamination_report.csv +107 -1
contamination_report.csv CHANGED
@@ -462,4 +462,110 @@ bigbio/mednli;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.
462
 
463
  RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
464
  RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
465
-
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
462
 
463
  RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
464
  RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/pdf/2308.08493;8
465
+
466
+
467
+ quac;;GPT-3;model;;99.0;;data-based;https://arxiv.org/abs/2005.14165;
468
+ rajpurkar/squad_v2;;GPT-3;model;;94.0;;data-based;https://arxiv.org/abs/2005.14165;
469
+ ucinlp/drop;;GPT-3;model;;93.0;;data-based;https://arxiv.org/abs/2005.14165;
470
+ Symbol Insertion;;GPT-3;model;;86.0;;data-based;https://arxiv.org/abs/2005.14165;
471
+ stanfordnlp/coqa;;GPT-3;model;;64.0;;data-based;https://arxiv.org/abs/2005.14165;
472
+ super_glue;record;GPT-3;model;;61.0;;data-based;https://arxiv.org/abs/2005.14165;
473
+ winograd_wsc;;GPT-3;model;;;60.0;data-based;https://arxiv.org/abs/2005.14165;
474
+ super_glue;boolq;GPT-3;model;;60.0;;data-based;https://arxiv.org/abs/2005.14165;
475
+ super_glue;multirc;GPT-3;model;;59.0;;data-based;https://arxiv.org/abs/2005.14165;
476
+ race;high;GPT-3;model;;;45.0;data-based;https://arxiv.org/abs/2005.14165;
477
+ cimec/lambada;;GPT-3;model;;;43.0;data-based;https://arxiv.org/abs/2005.14165;
478
+ super_glue;wsc;GPT-3;model;;40.0;;data-based;https://arxiv.org/abs/2005.14165;
479
+ piqa;;GPT-3;model;;29.0;;data-based;https://arxiv.org/abs/2005.14165;
480
+ wmt/wmt16;en-de;GPT-3;model;;;25.0;data-based;https://arxiv.org/abs/2005.14165;
481
+ wmt/wmt16;de-en;GPT-3;model;;;25.0;data-based;https://arxiv.org/abs/2005.14165;
482
+ race;middle;GPT-3;model;;;25.0;data-based;https://arxiv.org/abs/2005.14165;
483
+ rmanluo/RoG-webqsp;;GPT-3;model;;;21.0;data-based;https://arxiv.org/abs/2005.14165;
484
+ wmt/wmt16;en-ro;GPT-3;model;;;21.0;data-based;https://arxiv.org/abs/2005.14165;
485
+ wmt/wmt16;ro-en;GPT-3;model;;;21.0;data-based;https://arxiv.org/abs/2005.14165;
486
+ facebook/anli;test_r1;GPT-3;model;;;20.0;data-based;https://arxiv.org/abs/2005.14165;
487
+ facebook/anli;test_r2;GPT-3;model;;;18.0;data-based;https://arxiv.org/abs/2005.14165;
488
+ mandarjoshi/trivia_qa;GPT-3;model;;17.0;;data-based;https://arxiv.org/abs/2005.14165;
489
+ facebook/anli;test_r3;GPT-3;model;;;16.0;data-based;https://arxiv.org/abs/2005.14165;
490
+ wmt/wmt16;fr-en;GPT-3;model;;;14.0;data-based;https://arxiv.org/abs/2005.14165;
491
+ wmt/wmt16;en-fr;GPT-3;model;;;14.0;data-based;https://arxiv.org/abs/2005.14165;
492
+ super_glue;rte;GPT-3;model;;8.0;;data-based;https://arxiv.org/abs/2005.14165;
493
+ super_glue;wic;GPT-3;model;;8.0;;data-based;https://arxiv.org/abs/2005.14165;
494
+ super_glue;cb;GPT-3;model;;7.0;;data-based;https://arxiv.org/abs/2005.14165;
495
+ Reversed Words;;GPT-3;model;;7.0;;data-based;https://arxiv.org/abs/2005.14165;
496
+ Anagrams 2;;GPT-3;model;;7.0;;data-based;https://arxiv.org/abs/2005.14165;
497
+ allenai/openbookqa;;GPT-3;model;;;6.0;data-based;https://arxiv.org/abs/2005.14165;
498
+ ibragim-bad/arc_easy;;GPT-3;model;;;4.0;data-based;https://arxiv.org/abs/2005.14165;
499
+ Anagrams 1;;GPT-3;model;;3.0;;data-based;https://arxiv.org/abs/2005.14165;
500
+ ibragim-bad/arc_challenge;;GPT-3;model;;;3.0;data-based;https://arxiv.org/abs/2005.14165;
501
+ super_glue;copa;GPT-3;model;;3.0;;data-based;https://arxiv.org/abs/2005.14165;
502
+ Rowan/hellaswag;;GPT-3;model;;2.0;;data-based;https://arxiv.org/abs/2005.14165;
503
+ natural_questions;;GPT-3;model;;;1.0;data-based;https://arxiv.org/abs/2005.14165;
504
+ Cycled Letters;;GPT-3;model;;1.0;;data-based;https://arxiv.org/abs/2005.14165;
505
+ SAT Analogies;;GPT-3;model;;1.0;;data-based;https://arxiv.org/abs/2005.14165;
506
+ EdinburghNLP/xsum;;PaLM 2;model;;;42.0;data-based;https://arxiv.org/abs/2305.10403;
507
+ csebuetnlp/xlsum;;PaLM 2;model;;;46.9;data-based;https://arxiv.org/abs/2305.10403;
508
+ wiki_lingua;;PaLM 2;model;;;9.0;data-based;https://arxiv.org/abs/2305.10403;
509
+ winograd_wsc;;PaLM;model;;;38.5;data-based;https://arxiv.org/abs/2204.02311;
510
+ rmanluo/RoG-webqsp;;PaLM;model;;;26.7;data-based;https://arxiv.org/abs/2204.02311;
511
+ super_glue;wsc;PaLM;model;;;36.8;data-based;https://arxiv.org/abs/2204.02311;
512
+ mandarjoshi/trivia_qa;;PaLM;model;;19.9;;data-based;https://arxiv.org/abs/2204.02311;
513
+ rajpurkar/squad_v2;;PaLM;model;;85.2;;data-based;https://arxiv.org/abs/2204.02311;
514
+ super_glue;record;PaLM;model;;43.4;;data-based;https://arxiv.org/abs/2204.02311;
515
+ cimec/lambada;;PaLM;model;;;29.3;data-based;https://arxiv.org/abs/2204.02311;
516
+ super_glue;cb;PaLM;model;;48.2;;data-based;https://arxiv.org/abs/2204.02311;
517
+ ibragim-bad/arc_easy;;PaLM;model;;;30.4;data-based;https://arxiv.org/abs/2204.02311;
518
+ ibragim-bad/arc_challenge;;PaLM;model;;;24.7;data-based;https://arxiv.org/abs/2204.02311;
519
+ winograd_wsc;;GLaM;model;;67.3;;data-based;https://arxiv.org/abs/2112.06905;
520
+ winogrande;;GLaM;model;;;0.3;data-based;https://arxiv.org/abs/2112.06905;
521
+ super_glue;wic;GLaM;model;;8.2;;data-based;https://arxiv.org/abs/2112.06905;
522
+ super_glue;wsc;GLaM;model;;57.5;data-based;https://arxiv.org/abs/2112.06905;
523
+ mandarjoshi/trivia_qa;;GLaM;model;;18.8;;data-based;https://arxiv.org/abs/2112.06905;
524
+ story_cloze;;GLaM;model;;100.0;;data-based;https://arxiv.org/abs/2112.06905;
525
+ rajpurkar/squad_v2;;GLaM;model;;94.6;;data-based;https://arxiv.org/abs/2112.06905;
526
+ super_glue;record;GLaM;model;;98.6;;data-based;https://arxiv.org/abs/2112.06905;
527
+ super_glue;rte;GLaM;model;;54.9;;data-based;https://arxiv.org/abs/2112.06905;
528
+ race;middle;GLaM;model;;58.4;;data-based;https://arxiv.org/abs/2112.06905;
529
+ race;high;GLaM;model;;74.0;;data-based;https://arxiv.org/abs/2112.06905;
530
+ quac;;GLaM;model;;99.9;;data-based;https://arxiv.org/abs/2112.06905;
531
+ piqa;;GLaM;model;;49.8;;data-based;https://arxiv.org/abs/2112.06905;
532
+ allenai/openbookqa;;GLaM;model;;20.0;;data-based;https://arxiv.org/abs/2112.06905;
533
+ natural_questions;;GLaM;model;;3.9;;data-based;https://arxiv.org/abs/2112.06905;
534
+ super_glue;multirc;GLaM;model;;68.8;;data-based;https://arxiv.org/abs/2112.06905;
535
+ cimec/lambada;;GLaM;model;;;21.8;data-based;https://arxiv.org/abs/2112.06905;
536
+ Rowan/hellaswag;;GLaM;model;;19.8;;data-based;https://arxiv.org/abs/2112.06905;
537
+ stanfordnlp/coqa;;GLaM;model;;;75.0;data-based;https://arxiv.org/abs/2112.06905;
538
+ super_glue;copa;GLaM;model;;3.0;;data-based;https://arxiv.org/abs/2112.06905;
539
+ super_glue;cb;GLaM;model;;26.8;;data-based;https://arxiv.org/abs/2112.06905;
540
+ super_glue;boolq;GLaM;model;;92.1;;data-based;https://arxiv.org/abs/2112.06905;
541
+ ibragim-bad/arc_easy;;GLaM;model;;32.5;;data-based;https://arxiv.org/abs/2112.06905;
542
+ ibragim-bad/arc_challenge;;GLaM;model;;31.8;;data-based;https://arxiv.org/abs/2112.06905;
543
+ facebook/anli;dev_r3;GLaM;model;;40.7;;data-based;https://arxiv.org/abs/2112.06905;
544
+ facebook/anli;dev_r2;GLaM;model;;96.8;;data-based;https://arxiv.org/abs/2112.06905;
545
+ facebook/anli;dev_r1;GLaM;model;;96.2;;data-based;https://arxiv.org/abs/2112.06905;
546
+ winogrande;;FLAN;model;;;0.2;data-based;https://arxiv.org/abs/2109.01652;
547
+ mandarjoshi/trivia_qa;;FLAN;model;;22.8;;data-based;https://arxiv.org/abs/2109.01652;
548
+ story_cloze;;FLAN;model;;0.4;;data-based;https://arxiv.org/abs/2109.01652;
549
+ rajpurkar/squad_v2;;FLAN;model;;99.1;;data-based;https://arxiv.org/abs/2109.01652;
550
+ wmt/wmt16;ro-en;;FLAN;model;;;12.4;data-based;https://arxiv.org/abs/2109.01652;
551
+ super_glue;record;FLAN;model;;68.0;;data-based;https://arxiv.org/abs/2109.01652;
552
+ super_glue;rte;FLAN;model;;33.9;;data-based;https://arxiv.org/abs/2109.01652;
553
+ piqa;;FLAN;model;;51.3;;data-based;https://arxiv.org/abs/2109.01652;
554
+ allenai/openbookqa;;FLAN;model;;15.0;;data-based;https://arxiv.org/abs/2109.01652;
555
+ natural_questions;;FLAN;model;;3.2;;data-based;https://arxiv.org/abs/2109.01652;
556
+ super_glue;multirc;FLAN;model;;59.3;;data-based;https://arxiv.org/abs/2109.01652;
557
+ Rowan/hellaswag;;FLAN;model;;34.5;;data-based;https://arxiv.org/abs/2109.01652;
558
+ wmt/wmt16;fr-en;;FLAN;model;;;25.3;data-based;https://arxiv.org/abs/2109.01652;
559
+ wmt/wmt16;en-ro;;FLAN;model;;;12.4;data-based;https://arxiv.org/abs/2109.01652;
560
+ wmt/wmt16;en-fr;;FLAN;model;;;25.3;data-based;https://arxiv.org/abs/2109.01652;
561
+ wmt/wmt16;en-de;;FLAN;model;;;14.3;data-based;https://arxiv.org/abs/2109.01652;
562
+ wmt/wmt16;de-en;;FLAN;model;;;14.3;data-based;https://arxiv.org/abs/2109.01652;
563
+ ucinlp/drop;;FLAN;model;;99.4;;data-based;https://arxiv.org/abs/2109.01652;
564
+ super_glue;copa;FLAN;model;;9.0;;data-based;https://arxiv.org/abs/2109.01652;
565
+ super_glue;cb;FLAN;model;;5.4;;data-based;https://arxiv.org/abs/2109.01652;
566
+ super_glue;boolq;FLAN;model;;23.1;;data-based;https://arxiv.org/abs/2109.01652;
567
+ ibragim-bad/arc_easy;;FLAN;model;;20.2;;data-based;https://arxiv.org/abs/2109.01652;
568
+ ibragim-bad/arc_challenge;;FLAN;model;;15.6;;data-based;https://arxiv.org/abs/2109.01652;
569
+ facebook/anli;dev_r3;FLAN;model;;40.2;;data-based;https://arxiv.org/abs/2109.01652;
570
+ facebook/anli;dev_r2;FLAN;model;;97.9;;data-based;https://arxiv.org/abs/2109.01652;
571
+ facebook/anli;dev_r1;FLAN;model;;98.6;;data-based;https://arxiv.org/abs/2109.01652;