killawhale2 commited on
Commit
0dfb32e
1 Parent(s): 12acc01

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +20 -0
README.md CHANGED
@@ -56,6 +56,26 @@ Using the datasets mentioned above, we applied SFT and iterative DPO training, a
56
 
57
  [2] Yu, L., Jiang, W., Shi, H., Yu, J., Liu, Z., Zhang, Y., Kwok, J.T., Li, Z., Weller, A. and Liu, W., 2023. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
58
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  # **Evaluation Results**
60
 
61
  | Model | H6 | Model Size |
 
56
 
57
  [2] Yu, L., Jiang, W., Shi, H., Yu, J., Liu, Z., Zhang, Y., Kwok, J.T., Li, Z., Weller, A. and Liu, W., 2023. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284.
58
 
59
+ # **Data Contamination Test Results**
60
+
61
+ Recently, there have been contamination issues in some models on the LLM leaderboard.
62
+ We note that we made every effort to exclude any benchmark-related datasets from training.
63
+ We also ensured the integrity of our model by conducting a data contamination test [3] that is also used by the HuggingFace team [4, 5].
64
+
65
+ Our results, with `result < 0.1, %:` being well below 0.9, indicate that our model is free from contamination.
66
+
67
+ *The data contamination test results of HellaSwag and Winograde will be added once [3] supports them.*
68
+
69
+ | Model | ARC | MMLU | TruthfulQA | GSM8K |
70
+ |------------------------------|-------|-------|-------|-------|
71
+ | **SOLAR-10.7B-Instruct-v1.0**| result < 0.1, %: 0.06 |result < 0.1, %: 0.15 | result < 0.1, %: 0.28 | result < 0.1, %: 0.70 |
72
+
73
+ [3] https://github.com/swj0419/detect-pretrain-code-contamination
74
+
75
+ [4] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/474#657f2245365456e362412a06
76
+
77
+ [5] https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard/discussions/265#657b6debf81f6b44b8966230
78
+
79
  # **Evaluation Results**
80
 
81
  | Model | H6 | Model Size |