Spaces:
Running
on
CPU Upgrade
Request for Enhanced Model Submission Tracking on HuggingFaceH4/open_llm_leaderboard
Hi,
Would it be helpful to have a clear and direct follow-up after submitting models on HuggingFaceH4/open_llm_leaderboard, such as receiving notifications in our HF account? In my case, I submitted three models for evaluation last week. Until today, I have only seen the last model on the request page: https://huggingface.co/datasets/open-llm-leaderboard/requests. I have no information about the current status of the rest.
Thank you.
Hi!
We can't sync with user's HF accounts, as we don't store who submits which model.
However, I think you'll find most relevant information about how to properly report issues in the FAQ, in the About page of the leaderboard.
Side note: if your models do not appear in the request dataset, you have not managed to submit them properly.
Ok, thank you for your response. However, overall, the information on tracking submissions and evaluations on the leaderboard may need to be improved; it's not intuitive.
Thanks.
Hi,
Can you expand on what supplementary information you would need from the FAQ?
Yes, you have already mentioned that we can't sync with users' HF accounts, as we don't store who submits which model. In my opinion, sync HF accounts with the leaderboard would be helpful, and we would expect to have all status information about submitted models in the user's HF account.
In my case, it was not trivial to know that two of my submitted models had failed. I needed to go to the open-llm-leaderboard/requests page and check there. So, these models failed, and we don't have a clear explanation for why:
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Menouar/saqr-7b-instruct_eval_request_False_float16_Adapter.json
https://huggingface.co/datasets/open-llm-leaderboard/requests/blob/main/Menouar/fennec-7b-alpha_eval_request_False_float16_Adapter.json
So, if I understood correctly, after telling you that my models had failed, will you relaunch them automatically?
Thank you.
Hi!
We don't plan on syncing with user's account at the moment, as:
- it would force users of the leaderboard to have an account when submitting models, which is not the case at the moment
- it would require us to ask users to send auth tokens when submitting models, which would add complexity for submissions too.
I'm keeping this suggestion in mind longer term though.
Regarding your specific problem, thanks for pointing me to the request files.
I just checked the job logs, your models failed because you tried to load a state dict (your adapter model) of an incorrect size with respect to your base.
Here is the error
Traceback (most recent call last):
...
model = PeftModel.from_pretrained(base, adapter_weights)
File "lib/python3.10/site-packages/peft/peft_model.py", line 271, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "lib/python3.10/site-packages/peft/peft_model.py", line 561, in load_adapter
load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
File "lib/python3.10/site-packages/peft/utils/save_and_load.py", line 126, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.transformer.word_embeddings.weight: copying a param with shape torch.Size([65026, 4544]) from checkpoint, the shape in current model is torch.Size([65024, 4544]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([65026, 4544]) from checkpoint, the shape in current model is torch.Size([65024, 4544]).
Did you make sure it was possible to launch your model using the Autoclasses, as requested in the Submit form?
I'll relaunch your model once you fixed the above issue.
Ok, thanks for this clarification. I will come back to you once I have checked all these things.
Thanks.
Ok, closing for now, feel free to reopen once it's good on your side :)