Feature request: Run 100B + models automatically

#434
by ChuckMcSneed - opened

Goliath-120B(https://huggingface.co/alpindale/goliath-120b) was submitted for evaluation almost a month ago. There are still no results. According to @clefourrier there is not enough memory to run it. Please fix it.

Open LLM Leaderboard org

Hi @ChuckMcSneed ,
As mentioned in the other discussion, our backend cannot at the moment manage to evaluate models this big (as they don't fit on one A100 node). We will add the feature in our roadmap.

Thank you for opening this issue! We'll keep track of it!

clefourrier changed discussion title from Goliath-120B evaluation to Feature request: Run 100B + models automatically

I noticed falcon-180b got removed from the leaderboard and these finetunes still on it:
OpenBuddy/openbuddy-falcon-180b-v13-preview0
OpenBuddy/openbuddy-falcon-180b-v12-preview0
Are they needing to be retested?

Open LLM Leaderboard org

Hi! Falcon-180B is still on the leaderboard if you select the little toggle to "Show gated/deleted/private models"

We would also be very happy to see this feature added, submitted DiscoResearch/DiscoLM-120b some time ago and didn't know what was the reason for it being stuck at "pending".

Maybe you could add some manual job to run >70b models on demand on a 4*A100 instance?

Thank you for you work on the leaderboard!

Can you eval bnb 4 bit quantizations of large models? It be beneficial to have some kind of indication. I was thinking about quip# 2bitting a goliath model, so I can fit it on my gpu locally, but that will take like 1.5 weeks to calculate. If it's not better I dont feel like doing it.
So if I use bnb to 4 bit goliath and submit it for eval will it work?

I've submitted TheBloke/DiscoLM-120b-GPTQ now. Hope that works.

@clefourrier How is the progress? Still trying to implement it or is it just too expensive?

@clefourrier Why was MegaDolphin-120b successfully tested while all the other 120b models have failed?

Open LLM Leaderboard org

Hi @ChuckMcSneed ,
Can you link the request file? I suspect it was submitted quantized - which could barely fit on our GPUs.

Open LLM Leaderboard org

Interesting!
It might be because we went from using A100 to using H100, and they don't seem to manage memory exactly the same, which could have allowed a slightly bigger model to fit (but just barely).
Other idea: @SaylorTwift did you launch MegaDolphin manually?

If not, we could try relaunching some of the bigger models (like goliath) and see what happens

Are 100+B models supported now?
I submitted softwareweaver/Twilight-Miqu-146B and it's status says RUNNING since yesterday.

Open LLM Leaderboard org

Hi! They should be supported in most cases, but might still fail, the system is not super robust yet.
However, just FYI, since a 70B takes at minimum 10h to evaluate, don't expect a 146B to evaluate instantaneously XD

Thanks @clefourrier
Someone on Localllama forum said 100+B models are not supported and this thread did not have a definite answer.

Sign up or log in to comment