Add TF weights

#1
by joaogante HF staff - opened

Model converted by the transformers' pt_to_tf CLI.

All converted model outputs and hidden layers were validated against its Pytorch counterpart. Maximum crossload output difference=7.744e-04; Maximum converted output difference=7.744e-04.

cc @patrickvonplaten [HF maintainer(s) for this repo]

Related PR: https://github.com/huggingface/transformers/pull/17554

The error on the internal hidden layers was slightly above the desired level (<1e-5), but the output layers were fine. cc @sayakpaul @nielsr

The error on the internal hidden layers was slightly above the desired level (<1e-5), but the output layers were fine

Those probably were because of num_batches_tracked as used in PyTorch's BatchNorm layers. There's relevant information here: https://github.com/huggingface/transformers/pull/17554.

Cc: @amyeroberts

It's probably it's not the case -- we have many models where these differences in the internal layers exist, but the output layers have the correct values. We haven't figured out why, but it seems that it is no cause for alarm. Models where needed weights are not being loaded have very big errors everywhere.

Nevertheless, I reported it above in case we need to revisit the models with this mismatch :)

Thanks for adding!

nielsr changed pull request status to merged

I'm not sure that accounts for the differences here. As mentioned in the PR https://github.com/huggingface/transformers/pull/17554#issuecomment-1149672281, num_batches_tracked is important only if momentum isn't set and my understanding is it was set for all batch norm layers.

Sign up or log in to comment