OSainz AmeyaPrabhu commited on
Commit
c4acbf6
β€’
1 Parent(s): 36aaa79

Added Contamination Info on Old Models: GPT3, FLAN, GLaM, PaLM, PaLM 2 (#13)

Browse files

- Update contamination_report.csv (e068ce3ff3a9fc6a23ef66687ce1fb2ffcdc153d)
- Update contamination_report.csv (6febb2668817ca6bab3dee054bc1811a06d4e99a)
- Update contamination_report.csv (0f5ff265a43ef025e95dfd5217592d6f0226494e)
- Fix some format bugs (c714a687cee8b45c649fce9f2d9ef78fe558c144)
- Merge branch 'main' of https://huggingface.co/spaces/CONDA-Workshop/Data-Contamination-Report into pr/13 (b1e5746efc1087663f87cb8c7bcf32b598d959c6)
- Fix some arxiv links (403385af15e9561ffd666af1ba15a8638aa1552e)


Co-authored-by: Ameya Prabhu <AmeyaPrabhu@users.noreply.huggingface.co>

Files changed (1) hide show
  1. contamination_report.csv +113 -1
contamination_report.csv CHANGED
@@ -479,9 +479,121 @@ bigbio/mednli;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.
479
 
480
  RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
481
  RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
 
 
482
 
483
 
484
  openai_humaneval;;EleutherAI/pile;corpus;;;12.2;data-based;https://arxiv.org/abs/2403.04811;12
485
  mbpp;;EleutherAI/pile;corpus;;;3.6;data-based;https://arxiv.org/abs/2403.04811;12
486
  openai_humaneval;;bigcode/the-stack;corpus;;;18.9;data-based;https://arxiv.org/abs/2403.04811;12
487
- mbpp;;bigcode/the-stack;corpus;;;20.8;data-based;https://arxiv.org/abs/2403.04811;12
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
479
 
480
  RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
481
  RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
482
+ RadNLI;;GPT-4;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
483
+ RadNLI;;GPT-3.5;model;0.0;0.0;0.0;model-based;https://arxiv.org/abs/2308.08493;8
484
 
485
 
486
  openai_humaneval;;EleutherAI/pile;corpus;;;12.2;data-based;https://arxiv.org/abs/2403.04811;12
487
  mbpp;;EleutherAI/pile;corpus;;;3.6;data-based;https://arxiv.org/abs/2403.04811;12
488
  openai_humaneval;;bigcode/the-stack;corpus;;;18.9;data-based;https://arxiv.org/abs/2403.04811;12
489
+ mbpp;;bigcode/the-stack;corpus;;;20.8;data-based;https://arxiv.org/abs/2403.04811;12
490
+
491
+ quac;;GPT-3;model;;99.0;;data-based;https://arxiv.org/abs/2005.14165;13
492
+ rajpurkar/squad_v2;;GPT-3;model;;94.0;;data-based;https://arxiv.org/abs/2005.14165;13
493
+ ucinlp/drop;;GPT-3;model;;93.0;;data-based;https://arxiv.org/abs/2005.14165;13
494
+ Symbol Insertion;;GPT-3;model;;86.0;;data-based;https://arxiv.org/abs/2005.14165;13
495
+ stanfordnlp/coqa;;GPT-3;model;;64.0;;data-based;https://arxiv.org/abs/2005.14165;13
496
+ super_glue;record;GPT-3;model;;61.0;;data-based;https://arxiv.org/abs/2005.14165;13
497
+ winograd_wsc;;GPT-3;model;;;60.0;data-based;https://arxiv.org/abs/2005.14165;13
498
+ super_glue;boolq;GPT-3;model;;60.0;;data-based;https://arxiv.org/abs/2005.14165;13
499
+ super_glue;multirc;GPT-3;model;;59.0;;data-based;https://arxiv.org/abs/2005.14165;13
500
+ race;high;GPT-3;model;;;45.0;data-based;https://arxiv.org/abs/2005.14165;13
501
+ cimec/lambada;;GPT-3;model;;;43.0;data-based;https://arxiv.org/abs/2005.14165;13
502
+ super_glue;wsc;GPT-3;model;;40.0;;data-based;https://arxiv.org/abs/2005.14165;13
503
+ piqa;;GPT-3;model;;29.0;;data-based;https://arxiv.org/abs/2005.14165;13
504
+ wmt/wmt16;en-de;GPT-3;model;;;25.0;data-based;https://arxiv.org/abs/2005.14165;13
505
+ wmt/wmt16;de-en;GPT-3;model;;;25.0;data-based;https://arxiv.org/abs/2005.14165;13
506
+ race;middle;GPT-3;model;;;25.0;data-based;https://arxiv.org/abs/2005.14165;13
507
+ rmanluo/RoG-webqsp;;GPT-3;model;;;21.0;data-based;https://arxiv.org/abs/2005.14165;13
508
+ wmt/wmt16;en-ro;GPT-3;model;;;21.0;data-based;https://arxiv.org/abs/2005.14165;13
509
+ wmt/wmt16;ro-en;GPT-3;model;;;21.0;data-based;https://arxiv.org/abs/2005.14165;13
510
+ facebook/anli;test_r1;GPT-3;model;;;20.0;data-based;https://arxiv.org/abs/2005.14165;13
511
+ facebook/anli;test_r2;GPT-3;model;;;18.0;data-based;https://arxiv.org/abs/2005.14165;13
512
+ mandarjoshi/trivia_qa;;GPT-3;model;;17.0;;data-based;https://arxiv.org/abs/2005.14165;13
513
+ facebook/anli;test_r3;GPT-3;model;;;16.0;data-based;https://arxiv.org/abs/2005.14165;13
514
+ wmt/wmt16;fr-en;GPT-3;model;;;14.0;data-based;https://arxiv.org/abs/2005.14165;13
515
+ wmt/wmt16;en-fr;GPT-3;model;;;14.0;data-based;https://arxiv.org/abs/2005.14165;13
516
+ super_glue;rte;GPT-3;model;;8.0;;data-based;https://arxiv.org/abs/2005.14165;13
517
+ super_glue;wic;GPT-3;model;;8.0;;data-based;https://arxiv.org/abs/2005.14165;13
518
+ super_glue;cb;GPT-3;model;;7.0;;data-based;https://arxiv.org/abs/2005.14165;13
519
+ Reversed Words;;GPT-3;model;;7.0;;data-based;https://arxiv.org/abs/2005.14165;13
520
+ Anagrams 2;;GPT-3;model;;7.0;;data-based;https://arxiv.org/abs/2005.14165;13
521
+ allenai/openbookqa;;GPT-3;model;;;6.0;data-based;https://arxiv.org/abs/2005.14165;13
522
+ ibragim-bad/arc_easy;;GPT-3;model;;;4.0;data-based;https://arxiv.org/abs/2005.14165;13
523
+ Anagrams 1;;GPT-3;model;;3.0;;data-based;https://arxiv.org/abs/2005.14165;13
524
+ ibragim-bad/arc_challenge;;GPT-3;model;;;3.0;data-based;https://arxiv.org/abs/2005.14165;13
525
+ super_glue;copa;GPT-3;model;;3.0;;data-based;https://arxiv.org/abs/2005.14165;13
526
+ Rowan/hellaswag;;GPT-3;model;;2.0;;data-based;https://arxiv.org/abs/2005.14165;13
527
+ natural_questions;;GPT-3;model;;;1.0;data-based;https://arxiv.org/abs/2005.14165;13
528
+ Cycled Letters;;GPT-3;model;;1.0;;data-based;https://arxiv.org/abs/2005.14165;13
529
+ SAT Analogies;;GPT-3;model;;1.0;;data-based;https://arxiv.org/abs/2005.14165;13
530
+
531
+ EdinburghNLP/xsum;;PaLM 2;model;;;42.0;data-based;https://arxiv.org/abs/2305.10403;13
532
+ csebuetnlp/xlsum;;PaLM 2;model;;;46.9;data-based;https://arxiv.org/abs/2305.10403;13
533
+ wiki_lingua;;PaLM 2;model;;;9.0;data-based;https://arxiv.org/abs/2305.10403;13
534
+
535
+ winograd_wsc;;PaLM;model;;;38.5;data-based;https://arxiv.org/abs/2204.02311;13
536
+ rmanluo/RoG-webqsp;;PaLM;model;;;26.7;data-based;https://arxiv.org/abs/2204.02311;13
537
+ super_glue;wsc;PaLM;model;;;36.8;data-based;https://arxiv.org/abs/2204.02311;13
538
+ mandarjoshi/trivia_qa;;PaLM;model;;19.9;;data-based;https://arxiv.org/abs/2204.02311;13
539
+ rajpurkar/squad_v2;;PaLM;model;;85.2;;data-based;https://arxiv.org/abs/2204.02311;13
540
+ super_glue;record;PaLM;model;;43.4;;data-based;https://arxiv.org/abs/2204.02311;13
541
+ cimec/lambada;;PaLM;model;;;29.3;data-based;https://arxiv.org/abs/2204.02311;13
542
+ super_glue;cb;PaLM;model;;48.2;;data-based;https://arxiv.org/abs/2204.02311;13
543
+ ibragim-bad/arc_easy;;PaLM;model;;;30.4;data-based;https://arxiv.org/abs/2204.02311;13
544
+ ibragim-bad/arc_challenge;;PaLM;model;;;24.7;data-based;https://arxiv.org/abs/2204.02311;13
545
+
546
+ winograd_wsc;;GLaM;model;;67.3;;data-based;https://arxiv.org/abs/2112.06905;13
547
+ winogrande;;GLaM;model;;;0.3;data-based;https://arxiv.org/abs/2112.06905;13
548
+ super_glue;wic;GLaM;model;;8.2;;data-based;https://arxiv.org/abs/2112.06905;13
549
+ super_glue;wsc;GLaM;model;;57.5;;data-based;https://arxiv.org/abs/2112.06905;13
550
+ mandarjoshi/trivia_qa;;GLaM;model;;18.8;;data-based;https://arxiv.org/abs/2112.06905;13
551
+ story_cloze;;GLaM;model;;100.0;;data-based;https://arxiv.org/abs/2112.06905;13
552
+ rajpurkar/squad_v2;;GLaM;model;;94.6;;data-based;https://arxiv.org/abs/2112.06905;13
553
+ super_glue;record;GLaM;model;;98.6;;data-based;https://arxiv.org/abs/2112.06905;13
554
+ super_glue;rte;GLaM;model;;54.9;;data-based;https://arxiv.org/abs/2112.06905;13
555
+ race;middle;GLaM;model;;58.4;;data-based;https://arxiv.org/abs/2112.06905;13
556
+ race;high;GLaM;model;;74.0;;data-based;https://arxiv.org/abs/2112.06905;13
557
+ quac;;GLaM;model;;99.9;;data-based;https://arxiv.org/abs/2112.06905;13
558
+ piqa;;GLaM;model;;49.8;;data-based;https://arxiv.org/abs/2112.06905;13
559
+ allenai/openbookqa;;GLaM;model;;20.0;;data-based;https://arxiv.org/abs/2112.06905;13
560
+ natural_questions;;GLaM;model;;3.9;;data-based;https://arxiv.org/abs/2112.06905;13
561
+ super_glue;multirc;GLaM;model;;68.8;;data-based;https://arxiv.org/abs/2112.06905;13
562
+ cimec/lambada;;GLaM;model;;;21.8;data-based;https://arxiv.org/abs/2112.06905;13
563
+ Rowan/hellaswag;;GLaM;model;;19.8;;data-based;https://arxiv.org/abs/2112.06905;13
564
+ stanfordnlp/coqa;;GLaM;model;;;75.0;data-based;https://arxiv.org/abs/2112.06905;13
565
+ super_glue;copa;GLaM;model;;3.0;;data-based;https://arxiv.org/abs/2112.06905;13
566
+ super_glue;cb;GLaM;model;;26.8;;data-based;https://arxiv.org/abs/2112.06905;13
567
+ super_glue;boolq;GLaM;model;;92.1;;data-based;https://arxiv.org/abs/2112.06905;13
568
+ ibragim-bad/arc_easy;;GLaM;model;;32.5;;data-based;https://arxiv.org/abs/2112.06905;13
569
+ ibragim-bad/arc_challenge;;GLaM;model;;31.8;;data-based;https://arxiv.org/abs/2112.06905;13
570
+ facebook/anli;dev_r3;GLaM;model;;40.7;;data-based;https://arxiv.org/abs/2112.06905;13
571
+ facebook/anli;dev_r2;GLaM;model;;96.8;;data-based;https://arxiv.org/abs/2112.06905;13
572
+ facebook/anli;dev_r1;GLaM;model;;96.2;;data-based;https://arxiv.org/abs/2112.06905;13
573
+
574
+ winogrande;;FLAN;model;;;0.2;data-based;https://arxiv.org/abs/2109.01652;13
575
+ mandarjoshi/trivia_qa;;FLAN;model;;22.8;;data-based;https://arxiv.org/abs/2109.01652;13
576
+ story_cloze;;FLAN;model;;0.4;;data-based;https://arxiv.org/abs/2109.01652;13
577
+ rajpurkar/squad_v2;;FLAN;model;;99.1;;data-based;https://arxiv.org/abs/2109.01652;13
578
+ wmt/wmt16;ro-en;FLAN;model;;;12.4;data-based;https://arxiv.org/abs/2109.01652;13
579
+ super_glue;record;FLAN;model;;68.0;;data-based;https://arxiv.org/abs/2109.01652;13
580
+ super_glue;rte;FLAN;model;;33.9;;data-based;https://arxiv.org/abs/2109.01652;13
581
+ piqa;;FLAN;model;;51.3;;data-based;https://arxiv.org/abs/2109.01652;13
582
+ allenai/openbookqa;;FLAN;model;;15.0;;data-based;https://arxiv.org/abs/2109.01652;13
583
+ natural_questions;;FLAN;model;;3.2;;data-based;https://arxiv.org/abs/2109.01652;13
584
+ super_glue;multirc;FLAN;model;;59.3;;data-based;https://arxiv.org/abs/2109.01652;13
585
+ Rowan/hellaswag;;FLAN;model;;34.5;;data-based;https://arxiv.org/abs/2109.01652;13
586
+ wmt/wmt16;fr-en;FLAN;model;;;25.3;data-based;https://arxiv.org/abs/2109.01652;13
587
+ wmt/wmt16;en-ro;FLAN;model;;;12.4;data-based;https://arxiv.org/abs/2109.01652;13
588
+ wmt/wmt16;en-fr;FLAN;model;;;25.3;data-based;https://arxiv.org/abs/2109.01652;13
589
+ wmt/wmt16;en-de;FLAN;model;;;14.3;data-based;https://arxiv.org/abs/2109.01652;13
590
+ wmt/wmt16;de-en;FLAN;model;;;14.3;data-based;https://arxiv.org/abs/2109.01652;13
591
+ ucinlp/drop;;FLAN;model;;99.4;;data-based;https://arxiv.org/abs/2109.01652;13
592
+ super_glue;copa;FLAN;model;;9.0;;data-based;https://arxiv.org/abs/2109.01652;13
593
+ super_glue;cb;FLAN;model;;5.4;;data-based;https://arxiv.org/abs/2109.01652;13
594
+ super_glue;boolq;FLAN;model;;23.1;;data-based;https://arxiv.org/abs/2109.01652;13
595
+ ibragim-bad/arc_easy;;FLAN;model;;20.2;;data-based;https://arxiv.org/abs/2109.01652;13
596
+ ibragim-bad/arc_challenge;;FLAN;model;;15.6;;data-based;https://arxiv.org/abs/2109.01652;13
597
+ facebook/anli;dev_r3;FLAN;model;;40.2;;data-based;https://arxiv.org/abs/2109.01652;13
598
+ facebook/anli;dev_r2;FLAN;model;;97.9;;data-based;https://arxiv.org/abs/2109.01652;13
599
+ facebook/anli;dev_r1;FLAN;model;;98.6;;data-based;https://arxiv.org/abs/2109.01652;13