Tokenizer and Model Config Mismatch
#10
by
keremturgutlu
- opened
Config and tokenizer has different special token ids, which can be a problem for finetuning.
pretrained_config = AutoConfig.from_pretrained("tiiuae/falcon-7b", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("tiiuae/falcon-7b")
(pretrained_config.eos_token_id, tokenizer.eos_token_id,
pretrained_config.bos_token_id, tokenizer.bos_token_id)
>>
(2, 11, 1, None)
Yes, this is really redicoulous.
I agree too, and actually don't understand what we have to choose
@tiiuae Please avoid upload a wrong model (wrong tokenizer), this will missleading lots of people .
FalconLLM
changed discussion status to
closed
@FalconLLM
Please fix the issue, or at least post some explain on this, otherwise your behaviour might against hugginface community rules.
Users might get confused by your uploaded model. And this is not good for you as well.
@lucasjin
they fixed config.json
"bos_token_id": 11,
"eos_token_id": 11,
@dimaischenko OK, but this still make me confused, why bos is 11? Very strange
And the bos same as eos...... Very strange....