Data-Contamination-Database / contamination_report.csv
Iker's picture
Initital commit
eba8a37
raw
history blame
447 Bytes
Evaluation Dataset;Contaminated Source;Model or corpus;Train Split;Development Split;Test Split;Approach;Citation;PR Link
conll2003;google/gemma-7b;model;1.0;1.0;1.0;model-based;https://hitz-zentroa.github.io/lm-contamination/blog/;
conll2003;EleutherAI/the_pile_deduplicated;corpus;1.0;1.0;1.0;data-based;https://aclanthology.org/2023.findings-emnlp.722/;www.google.com
Test;lololol;corpus;1.0;1.0;1.0;data-based;https://arxiv.org/abs/2310.03668;