Spaces:

OALL
/

Open-Arabic-LLM-Leaderboard

Running on CPU Upgrade

App Files Files Community

FAQ - v1

by alielfilali01 - opened May 14, 2024

Discussion

alielfilali01

Open Arabic LLM Leaderboard org May 14, 2024

Please feel free to ask all your questions here

MohamedRashad

May 14, 2024

The updating of the leaderboard is a little bit slow.
I submitted a model and it doesn't show in bending evaluations until now, (Nor any thing changes or moves)

alielfilali01

Open Arabic LLM Leaderboard org May 14, 2024

The updating of the leaderboard is a little bit slow.
I submitted a model and it doesn't show in bending evaluations until now, (Nor any thing changes or moves)

@MohamedRashad
There is some heavy models that are currently on eval in parallel and that's what blocking the leaderboard, we expect to see more Finished (more than 14) by tomorrow.
I checked and it seems that all the models in requests dataset are in the PENDING toggle under the "Submit here" tab, so apologies but i fail to understand what you meant in generally

MohamedRashad

May 14, 2024

I found the model i submitted now 😅

Everything is working great ^^

soufianechami

May 17, 2024

I know this might seem obvious to many users here, but some (myself included) still think the current leaderboard is the final evaluation.

Please make it clear to users that the ranking is not final—the evaluation is still ongoing.

Also, could you provide an estimated timeline for when the evaluation will be complete?

alielfilali01

Open Arabic LLM Leaderboard org May 18, 2024

Dear @soufianechami , Leaderboards by nature are never at a final state, models are coming eveyday and got submitted then evaluated respectively. In order to be up to date, you will have (it is a must) to check on the leaderboard every ounce and a while

rahimnathwani

May 21, 2024

I'm curious whether there will be a section for embedding models?

Huggingface has a leaderboard for embedding models (https://huggingface.co/spaces/mteb/leaderboard) but the scores and ranking are all based on English, Chinese, French and Polish.

It's hard to know which of the models may work well for Arabic, e.g. for building the retrieval part of a RAG system.

derek-thomas

Open Arabic LLM Leaderboard org May 27, 2024

@rahimnathwani you can find Arabic under STS -> Other

konstantindobler

Jun 1, 2024

Hi, thanks for compiling this resource!

Could you provide the exact lighteval command / config used for the evaluations? For example, in the ./examples/tasks/OALL.txt from the official lighteval repo, (almost) all tasks are evaluated 5-shot with |5|1 however in the leaderboard, everything is 0-shot.

Hamza-Alobeidli

Open Arabic LLM Leaderboard org Jun 4, 2024

Hi, thanks for compiling this resource!

Could you provide the exact lighteval command / config used for the evaluations? For example, in the ./examples/tasks/OALL.txt from the official lighteval repo, (almost) all tasks are evaluated 5-shot with |5|1 however in the leaderboard, everything is 0-shot.

Hello,
yes please only change all to |0|0
This is our setting.

ahmadelsallab

Aug 6, 2024

Hi,

Which dataset source is used in the ACVA benchmark?
This one: https://huggingface.co/datasets/FreedomIntelligence/ACVA-Arabic-Cultural-Value-Alignment/viewer/default/validation

Or this one: https://huggingface.co/datasets/OALL/ACVA

ahmadelsallab

Aug 6, 2024

Also, for AlGhafa benchmark, which dataset is used?

There are multiple datasets here: https://gitlab.com/tiiuae/alghafa/-/tree/main?ref_type=heads

Also, on the OALL/Datasets I can find:
https://huggingface.co/datasets/OALL/AlGhafa-Arabic-LLM-Benchmark-Translated

And:
https://huggingface.co/datasets/OALL/AlGhafa-Arabic-LLM-Benchmark-Native

So which one is used? And how is the final metric is calculated over the benchmark datasets?

ManojShack

Sep 12, 2024

•

edited Sep 13, 2024

Hi @alielfilali01 ,
I submitted finetuned adapter airev-ai/Amal-70b-v2.3.2 (base model - airev-ai/Amal-70b-v2) with bfloat16 precision couple of hours back. But I could see the status as failed in the requests card. Both adapter and base model are public, have added model card and even attached a valid license. I am kinda unsure why the model submission is failed. Any assistance in this matter would be greatly appreciated.

Thank you.

amztheory

Open Arabic LLM Leaderboard org Sep 18, 2024

Hi @ManojShack
Thanks for submitting your model to the leaderboard.
Regarding your concern, when attempting to evaluate your models we ran against errors that your models are missing config.json. Thus, ensure the config is included and submit again.

Jumana25

Dec 10, 2024

I am writing to request the inclusion of Stability AI's Arabic Stable LM 1.6B, Aya 8B, Jais 13B, and Noon 7B in the Open Arabic LLM Leaderboard.

The models are publicly available at:
https://huggingface.co/stabilityai/ar-stablelm-2-base
https://huggingface.co/CohereForAI/aya-expanse-8b
https://huggingface.co/inceptionai/jais-13b
https://huggingface.co/Naseej/noon-7b

I believe their inclusion would benefit the community by providing a robust benchmark for Arabic language models.

alielfilali01

Open Arabic LLM Leaderboard org Dec 10, 2024

Hey @Jumana25
Please feel free to submit whatever models you want to see in the leaderboard.
Thanks

StarscreamDeceptions

Dec 12, 2024

Hey , bro, I tried to test your evaluation code offline, but I found that the asas-ai/AraTrust-categorized dataset does not exist on hugging face. Can you fix this problem?

oddadmix

Jan 18

Hi Team,

I submitted a oddadmix/arabic-qwen-3b-v2.2-wikipedia, and for some reason, it failed.

I did run the evaluation, and it launched and completed just fine.

Please advise

alielfilali01

Open Arabic LLM Leaderboard org Jan 28

Hello @oddadmix , we will try to investigate that when possible. Thank you

alielfilali01 changed discussion title from FAQ to FAQ - v1 Jan 28

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment