Update safetensors to have embedding layer
Fixes https://github.com/huggingface/transformers/issues/34759
Proposed solution :
The safetensors file had the embedding layer missing.
I loaded the model from the existing weights file and saved it as safetensors
You can test the functionality of the updated safetensors with the following script
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("facebook/MobileLLM-125M", use_fast=False)
mobilellm_old = AutoModelForCausalLM.from_pretrained("facebook/MobileLLM-125M",trust_remote_code=True, use_safetensors=True)
mobilellm = AutoModelForCausalLM.from_pretrained("/Users/mayankagarwal/Documents/OSS/codebases/MobileLLM-125M",trust_remote_code=True, use_safetensors=True)
input = tokenizer("Hello word!", return_tensors="pt")
output_old = mobilellm_old.generate(**input)
decoded = tokenizer.decode(output_old[0], skip_special_tokens=True)
print("Old decoded output:", decoded)
output = mobilellm.generate(**input)
decoded = tokenizer.decode(output[0], skip_special_tokens=True)
print("Updated decoded output:", decoded)
Here's a screenshot of the output
@zechunliu Please do take a look!
Thank you so much for raising this issue! It's a pity I just noticed this. I have removed the safetensors, and it should work now. The original pytorch_model.bin is correct. Let me know if you spot any other issues!
Hey
@zechunliu
, the safe tensors file actually did contain the embedding params, they were just named lm_head
. To my knowledge safe tensors format doesn't like weight-sharing, so the base model which references this matrix as both lm_head
and embed_tokens
was arbitrarily choosing to drop embed_tokens
.
Would love to have the original safetensors variant still available, either with a small code change in model loading to re-tie embeddings on loading, or just renaming the safetensors model to something like model_no_embed.safetensors
, I have some training pipelines that rely on it and am very grateful for you releasing these weights in the first place!
(on a slightly unrelated note - are these models trained in BF16 or FP16? It seems these tensors are in fp16, while the larger models are in bf16. I can't quite tell from your GitHub repo nor paper which was intended either!)