truthfulqa

#1
by pankajmathur - opened

Just curious, does the training dataset includes the truthfulqa dataset which is being used during evaluation for Open LLM LB?

If so, is it still fair to use truthfulqa as a metric to evaluate this model? Because compared with other 7B, (or maybe 65B/70B) models, the truthfulqa metric is very high.

Screenshot 2023-08-26 at 1.23.19 AM.png

Hi, just reaching out again for any updates on the original question ?

Tiger Research org

thanks for your interests and comments. we made due diligence to avoid data contamination or leakage. our practice is no use of test/validation subsets of any open datasets. however, we did use some secondary or merged mix of data in our earlier experiments. We will further examine carefully to dd any contamination. meanwhile, our base model performs reasonably well, so users are welcome to take that into perspective.

yechen changed discussion status to closed

Sign up or log in to comment