open-llm-leaderboard/open_llm_leaderboard

Aug 24

Hi! I've submitted most of the google FLAN-t5 models for evaluation on the leaderboard after seeing that flan-t5-small worked and had results on the current version of the leaderboard. Most of the subsequent requests failed, except for google/flan-t5-xl which worked in both float16 and bf16... no idea why the intermediate models might fail.

Here are links to the relevant request files:

https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/google/flan-t5-large_eval_request_False_float16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/google/flan-t5-base_eval_request_False_float16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/google/flan-t5-xxl_eval_request_False_float16_Original.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/google/flan-ul2_eval_request_False_bfloat16_Original.json
- this one might be pushing the limits of a single node for eval, but would be awesome if there's a way to do it given the surprisingly decent performance of flan-t5-xl

Let me know how we can move forward on this, I think that at least the base/large should be doable. Thanks in advance!

alozowski

Open LLM Leaderboard org Aug 28

Hi @pszemraj ,

Thank you for this issue, I fixed the evaluation for all these models and you can now find them all on the Leaderboard!

Let me close this discussion, please ping me here in case of any question about these models or start a new discussion

alozowski changed discussion status to closed Aug 28

Spaces:

open-llm-leaderboard
/

open_llm_leaderboard

Running on CPU Upgrade

flan-t5 evals failing