Add TF weights
Model converted by the transformers
' pt_to_tf
CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.
Maximum crossload output difference=3.612e-05; Maximum crossload hidden layer difference=1.945e-03;
Maximum conversion output difference=3.612e-05; Maximum conversion hidden layer difference=1.945e-03;
List of maximum output differences above the threshold (1e-19):
logits: 2.098e-05
cls_logits: 3.612e-05
distillation_logits: 1.717e-05
List of maximum hidden layer differences above the threshold (1e-19):
hidden_states[0]: 3.052e-05
hidden_states[1]: 3.910e-05
hidden_states[2]: 4.959e-05
hidden_states[3]: 7.435e-05
hidden_states[4]: 8.395e-05
hidden_states[5]: 1.141e-04
hidden_states[6]: 2.332e-04
hidden_states[7]: 6.638e-04
hidden_states[8]: 1.472e-03
hidden_states[9]: 1.801e-03
hidden_states[10]: 1.801e-03
hidden_states[11]: 1.945e-03
hidden_states[12]: 1.907e-03