ARC 77.73, HellaSwag 91.88, TOP under 22B - Three new HF Records!

#4
by LordTwave - opened

Congrats on making it on top - three records set! (VERY impressive for a 22B.)

Your model;
HellaSwag - 91.88%
ARC - 77.73%
Avg - 77.74%

Next highest;
https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2
HellaSwag - 90.11%
Avg - 73.23%

HellaSwag Diff
(yours)91.88% - (theirs)90.11% = (in favor of yours)1.77% diff

Your smaller model beats this larger model on HellaSwag by 1.77%.

Average Diff
(yours)77.74% - (theirs)73.23% = (in favor of yours)4.51% diff

Your smaller model beats the other model in Average by 4.51%.

Next highest Arc
https://huggingface.co/abacusai/Smaug-72B-v0.1
ARC 76.02%
(their average is higher, above 80% but they have a larger LLM so that would not be out of line.)

ARC Diff
77.73% - 76.02% = 1.61%

Congratulations on making top HF HellaSwag and ARC! Thank you for your contributions!
(Note: These numbers, despite being a marker of a "small" difference, matter lots in token-by-token fidelity of outputs.)

LordTwave changed discussion title from 91.88 - A new HF HellaSwag Record! to ARC 77.73, HellaSwag 91.88 - A new HF Record!
LordTwave changed discussion title from ARC 77.73, HellaSwag 91.88 - A new HF Record! to ARC 77.73, HellaSwag 91.88, TOP 21B - Three new HF Records!
LordTwave changed discussion title from ARC 77.73, HellaSwag 91.88, TOP 21B - Three new HF Records! to ARC 77.73, HellaSwag 91.88, TOP under 22B - Three new HF Records!

Congrats on making it on top - three records set! (VERY impressive for a 22B.)

Your model;
HellaSwag - 91.88%
ARC - 77.73%
Avg - 77.74%

Next highest;
https://huggingface.co/Sao10K/WinterGoddess-1.4x-70B-L2
HellaSwag - 90.11%
Avg - 73.23%

HellaSwag Diff
(yours)91.88% - (theirs)90.11% = (in favor of yours)1.77% diff

Your smaller model beats this larger model on HellaSwag by 1.77%.

Average Diff
(yours)77.74% - (theirs)73.23% = (in favor of yours)4.51% diff

Your smaller model beats the other model in Average by 4.51%.

Next highest Arc
https://huggingface.co/abacusai/Smaug-72B-v0.1
ARC 76.02%
(their average is higher, above 80% but they have a larger LLM so that would not be out of line.)

ARC Diff
77.73% - 76.02% = 1.61%

Congratulations on making top HF HellaSwag and ARC! Thank you for your contributions!
(Note: These numbers, despite being a marker of a "small" difference, matter lots in token-by-token fidelity of outputs.)

https://huggingface.co/saltlux/luxia-21.4b-alignment-v1.0/discussions/5

Could someone run a contamination test on this model? I'm itching to know and I don't like the as-of-yet seeming unsubstantiated slander going on in that thread.

Sign up or log in to comment