Spaces:
Running
on
CPU Upgrade
FAQ
Please feel free to ask all your questions here
The updating of the leaderboard is a little bit slow.
I submitted a model and it doesn't show in bending evaluations until now, (Nor any thing changes or moves)
The updating of the leaderboard is a little bit slow.
I submitted a model and it doesn't show in bending evaluations until now, (Nor any thing changes or moves)
@MohamedRashad
There is some heavy models that are currently on eval in parallel and that's what blocking the leaderboard, we expect to see more Finished (more than 14) by tomorrow.
I checked and it seems that all the models in requests dataset are in the PENDING toggle under the "Submit here" tab, so apologies but i fail to understand what you meant in generally
I found the model i submitted now ๐
Everything is working great ^^
I know this might seem obvious to many users here, but some (myself included) still think the current leaderboard is the final evaluation.
Please make it clear to users that the ranking is not finalโthe evaluation is still ongoing.
Also, could you provide an estimated timeline for when the evaluation will be complete?
Dear @soufianechami , Leaderboards by nature are never at a final state, models are coming eveyday and got submitted then evaluated respectively. In order to be up to date, you will have (it is a must) to check on the leaderboard every ounce and a while
I'm curious whether there will be a section for embedding models?
Huggingface has a leaderboard for embedding models (https://huggingface.co/spaces/mteb/leaderboard) but the scores and ranking are all based on English, Chinese, French and Polish.
It's hard to know which of the models may work well for Arabic, e.g. for building the retrieval part of a RAG system.
Hi, thanks for compiling this resource!
Could you provide the exact lighteval
command / config used for the evaluations? For example, in the ./examples/tasks/OALL.txt
from the official lighteval
repo, (almost) all tasks are evaluated 5-shot with |5|1
however in the leaderboard, everything is 0-shot.
Hi, thanks for compiling this resource!
Could you provide the exact
lighteval
command / config used for the evaluations? For example, in the./examples/tasks/OALL.txt
from the officiallighteval
repo, (almost) all tasks are evaluated 5-shot with|5|1
however in the leaderboard, everything is 0-shot.
Hello,
yes please only change all to |0|0
This is our setting.
Hi,
Which dataset source is used in the ACVA benchmark?
This one: https://huggingface.co/datasets/FreedomIntelligence/ACVA-Arabic-Cultural-Value-Alignment/viewer/default/validation
Or this one: https://huggingface.co/datasets/OALL/ACVA
Also, for AlGhafa benchmark, which dataset is used?
There are multiple datasets here: https://gitlab.com/tiiuae/alghafa/-/tree/main?ref_type=heads
Also, on the OALL/Datasets I can find:
https://huggingface.co/datasets/OALL/AlGhafa-Arabic-LLM-Benchmark-Translated
And:
https://huggingface.co/datasets/OALL/AlGhafa-Arabic-LLM-Benchmark-Native
So which one is used? And how is the final metric is calculated over the benchmark datasets?
Hi
@alielfilali01
,
I submitted finetuned adapter airev-ai/Amal-70b-v2.3.2 (base model - airev-ai/Amal-70b-v2) with bfloat16 precision couple of hours back. But I could see the status as failed in the requests card. Both adapter and base model are public, have added model card and even attached a valid license. I am kinda unsure why the model submission is failed. Any assistance in this matter would be greatly appreciated.
Thank you.
Hi
@ManojShack
Thanks for submitting your model to the leaderboard.
Regarding your concern, when attempting to evaluate your models we ran against errors that your models are missing config.json. Thus, ensure the config is included and submit again.
I am writing to request the inclusion of Stability AI's Arabic Stable LM 1.6B, Aya 8B, Jais 13B, and Noon 7B in the Open Arabic LLM Leaderboard.
The models are publicly available at:
https://huggingface.co/stabilityai/ar-stablelm-2-base
https://huggingface.co/CohereForAI/aya-expanse-8b
https://huggingface.co/inceptionai/jais-13b
https://huggingface.co/Naseej/noon-7b
I believe their inclusion would benefit the community by providing a robust benchmark for Arabic language models.
Hey
@Jumana25
Please feel free to submit whatever models you want to see in the leaderboard.
Thanks
Hey , bro, I tried to test your evaluation code offline, but I found that the asas-ai/AraTrust-categorized dataset does not exist on hugging face. Can you fix this problem?