Mistral 7B Arc Easy Contamination based on "Proving Test Set Contamination in Black Box Language Models"
Browse files# What are you reporting:
- [ ] Evaluation dataset(s) found in a pre-training corpus. (e.g. COPA found in ThePile)
- [x] Evaluation dataset(s) found in a pre-trained model. (e.g. FLAN T5 has been trained on ANLI)
Contaminated Evaluation Dataset(s):
- ibragim-bad/arc_easy
Contaminated Model:
- Mistral 7B
Approach:
- [ ] Data-based approach
- [x] Model-based approach
**Description of your method, 3-4 sentences. Evidence of data contamination:**
They perform a statistical test on log probs of the model, where they compare the log prob of the dataset under its original ordering to the log probability under random permutations. Specifically, they have a shared version where they test that the log-probability under the canonical ordering X is higher than the average log probability under a random permutation.
**Citation:**
Is there a paper that reports the data contamination or describes the method used to detect data contamination? Yes
**url**: https://arxiv.org/abs/2310.17623
```@article{oren2023proving,
title={Proving test set contamination in black box language models},
author={Oren, Yonatan and Meister, Nicole and Chatterji, Niladri and Ladhak, Faisal and Hashimoto, Tatsunori B},
journal={arXiv preprint arXiv:2310.17623},
year={2023}
}
```
Important! If you wish to be listed as an author in the final report, please complete this information for all the authors of this Pull Request.
Full name: Ameya Prabhu
Institution: Tübingen AI Center, University of Tübingen
Email: ameya@prabhu.be
- contamination_report.csv +2 -0
@@ -597,3 +597,5 @@ ibragim-bad/arc_challenge;;FLAN;model;;15.6;;data-based;https://arxiv.org/abs/21
|
|
597 |
facebook/anli;dev_r3;FLAN;model;;40.2;;data-based;https://arxiv.org/abs/2109.01652;13
|
598 |
facebook/anli;dev_r2;FLAN;model;;97.9;;data-based;https://arxiv.org/abs/2109.01652;13
|
599 |
facebook/anli;dev_r1;FLAN;model;;98.6;;data-based;https://arxiv.org/abs/2109.01652;13
|
|
|
|
|
|
597 |
facebook/anli;dev_r3;FLAN;model;;40.2;;data-based;https://arxiv.org/abs/2109.01652;13
|
598 |
facebook/anli;dev_r2;FLAN;model;;97.9;;data-based;https://arxiv.org/abs/2109.01652;13
|
599 |
facebook/anli;dev_r1;FLAN;model;;98.6;;data-based;https://arxiv.org/abs/2109.01652;13
|
600 |
+
|
601 |
+
ibragim-bad/arc_easy;;Mistral 7B;model;;;100.0;model-based;https://arxiv.org/abs/2310.17623;14
|