llm_contamination_detector / data /code_eval_board.csv
Yeyito's picture
Current evaluations (num_z = 50)
0c9031b
raw
history blame
112 Bytes
T,Models,ARC,HellaSwag,MMLU,TruthfulQA,Winogrande,GSM8K
🟢,roneneldan/TinyStories-3M,0.06,0.1,0.13,0.2,0.01,0