Context size in the README does not seem to be correct.

by mahiatlinux - opened Sep 27, 2024

Sep 27, 2024

Hi there. Very nice line of models!

The README mentions that this model has 128K context, but taking a look at the config file, it only appears to have 32K context. Please correct me if I am wrong on this.

README

config.json

jklj077

Qwen org Oct 8, 2024

•

edited Oct 10, 2024

Hi there!

Thanks for the report. In this case, the readme/modelcard is accurate.

Please also note this comment: https://huggingface.co/Qwen/Qwen2.5-32B/discussions/1#66f11ee85a65f26be709470b

For base models, the context length was evaluated with ppl. Even though the model was trained with 32k, the ppl did not degrade for 132k. However, it does not mean that the model can generate sequences of that length.

mahiatlinux

Oct 8, 2024

Thank you very much for the clarification!

mahiatlinux changed discussion status to closed Oct 8, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment