Update README, clarify training data.

by Stealcase - opened May 19, 2024

base: refs/heads/main

←

from: refs/pr/9

Discussion Files changed

-3

Stealcase

May 19, 2024

The previous sentence implies the models were only trained on "Norwegian data". Possible interpretations for this sentence are:

data is exclusively produced by Norwegians
data is exclusively produced in Norway
data that is inherently in the Norwegian language

None of these interpretations are correct.

Update README, clarify training data.ff1a5bfc

davda54

Norwegian Large Language Models org May 20, 2024

Hi, thanks for opening a pull request! However, the base models were not trained on translated Norwegian; your correction is, in fact, incorrect :) You can find more details about the pretraining corpus in this section.

Stealcase

Jun 21, 2024

You're right, it seems it is only ´Instruction-tuned NorMistral-7b-warm´ which is trained on machine translated data.
I mistakenly thought https://huggingface.co/norallm/normistral-7b-warm-instruct/ was a describing the common training process for all the models since all the models were being summarized on the page.

Stealcase changed pull request status to closed Jun 21, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment