Typo in Model card

#1
by quaeast - opened

In the 9th line of the usage python snippet, the separator of two sentence is </s></s>. Should it be </s><s> ?

inputs = tokenizer(["</s></s>".join(input_pair) for input_pair in input_pairs], return_tensors="pt")

Hi! Sorry we forgot to answer 🙏🏻

Actually it seems that for NLI tasks it isn't as you say. You can check that passing to the tokenizer directly the list of tuples (premise, hypothesis) and then check the token_ids:

input_pairs = [("I like this pizza.", "The sentence is positive."), ("I like this pizza.", "The sentence is negative.")]
inputs = tokenizer(input_pairs, return_tensors="pt")
# Output
#{'input_ids': tensor([[    0,  1049,  2070,  2027, 10737,  1016,     2,     2,  2000,  6255,
#         2007,  3897,  1016,     2],
#        [    0,  1049,  2070,  2027, 10737,  1016,     2,     2,  2000,  6255,
#          2007,  5001,  1016,     2]]), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
#        [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]])}

If you check the vocab.txt file of the model, you'll see that the token_id=2 is </s>.

Cheers!

Sign up or log in to comment