Update README, clarify training data.
#9
by
Stealcase
- opened
The previous sentence implies the models were only trained on "Norwegian data". Possible interpretations for this sentence are:
- data is exclusively produced by Norwegians
- data is exclusively produced in Norway
- data that is inherently in the Norwegian language
None of these interpretations are correct.
Hi, thanks for opening a pull request! However, the base models were not trained on translated Norwegian; your correction is, in fact, incorrect :) You can find more details about the pretraining corpus in this section.
You're right, it seems it is only ´Instruction-tuned NorMistral-7b-warm´ which is trained on machine translated data.
I mistakenly thought https://huggingface.co/norallm/normistral-7b-warm-instruct/ was a describing the common training process for all the models since all the models were being summarized on the page.
Stealcase
changed pull request status to
closed