Spaces:
Running
on
CPU Upgrade
[FLAG] Suspiciously High TruthfulQA for TigerResearch/tigerbot-7b-sft-v1
TigerResearch/tigerbot-7b-sft-v1 seems to have truthfulqa 58.18 which is in an outlier range, not only for all the comparative 7b models out there but suspiciously higher then any other also 13b, 34b and 70b in this range, please see the screenshot from LB:
I have reached out to Authors and opened the discussion asking for details , however I haven't got any response from them so far:
=> https://huggingface.co/TigerResearch/tigerbot-7b-sft-v1/discussions/1
@clefourrier : let us know what should be the next steps for this model on LB.
Hi! Thank you for this issue, it's very complete!
Let's give them a week to investigate their secondary data, and if they have not then I'll flag their model.
It's been a week, since they don't seem to have actually examined their secondary data for contamination, I'll flag it and let users decide whether to use it or not.
Agreed, thanks for keeping tab on this one.