Add TF weights
Model converted by the transformers
' pt_to_tf
CLI. All converted model outputs and hidden layers were validated against its Pytorch counterpart.
Maximum crossload output difference=7.688e-04; Maximum crossload hidden layer difference=2.106e-03;
Maximum conversion output difference=7.688e-04; Maximum conversion hidden layer difference=2.106e-03;
CAUTION: The maximum admissible error was manually increased to 0.009!
List of maximum output differences above the threshold (1e-10):
past_key_values[0][0]: 1.192e-06
past_key_values[0][1]: 3.576e-07
past_key_values[0][2]: 3.998e-04
past_key_values[0][3]: 2.315e-04
past_key_values[1][0]: 2.861e-06
past_key_values[1][1]: 1.192e-06
past_key_values[1][2]: 4.759e-04
past_key_values[1][3]: 3.055e-04
past_key_values[2][0]: 4.292e-06
past_key_values[2][1]: 1.215e-06
past_key_values[2][2]: 7.688e-04
past_key_values[2][3]: 2.686e-04
past_key_values[3][0]: 4.292e-06
past_key_values[3][1]: 3.278e-06
past_key_values[3][2]: 4.139e-04
past_key_values[3][3]: 2.567e-04
List of maximum hidden layer differences above the threshold (1e-10):
last_hidden_state: 2.670e-04
decoder_hidden_states[1]: 1.621e-05
decoder_hidden_states[2]: 2.098e-05
decoder_hidden_states[3]: 2.003e-05
decoder_hidden_states[4]: 2.670e-04
encoder_last_hidden_state: 1.171e-03
encoder_hidden_states[0]: 1.383e-05
encoder_hidden_states[1]: 1.383e-05
encoder_hidden_states[2]: 2.337e-05
encoder_hidden_states[3]: 2.106e-03
encoder_hidden_states[4]: 1.171e-03
Thanks :)