Fix `model_max_length` in `tokenizer_config.json`
#7
by
bryant1410
- opened
The current value of model_max_length
in tokenizer_config.json
(basically infinity) is inconsistent with max_position_embeddings
in config.json
. It's also inconsistent with that of bge-base-en
.
This also happens with bge-small-en
, but I was thinking that it was good to have any potential discussion here first, before sending a PR for that one as well.
Thanks!
The tokenizer_config.config is generated by huggingface transformers package automatically. To avoid confusion for users, It's better to fix this.
Shitao
changed pull request status to
merged
Yeah, not only for confusion, but I also forgot to mention that the tokenization wouldn't respect the max length otherwise.