norallm
/

norbloom-7b-scratch

Text Generation

Norwegian Bokmål

Norwegian Nynorsk

feature-extraction

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

oepen commited on Feb 11

Commit

48e5e27

•

1 Parent(s): 02f80e8

typo

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -22,11 +22,11 @@ datasets:
 NorBLOOM-7b-scratch is a large Norwegian language model pretrained from scratch on a total of 260 billion subword tokens (using six repetitions of open Norwegian texts).
-This model is a part of the NORA-LLM family developed in collaboration between [the Language Technology Group at the University of Oslo](https://huggingface.co/ltg), [the High Performance Language Technologies (HPLT) project](https://hplt-project.org/), [the National Library of Norway](https://huggingface.co/NbAiLab), and [the University of Turku](https://huggingface.co/TurkuNLP).
 All the models are pre-trained on the same dataset and with the same tokenizer.
 NorBLOOM-7b-scratch has around 7 billion parameters and is based on [the BLOOM architecture](https://arxiv.org/abs/2211.05100).
-The NORA-LLM language model family includes (as of now):
 - [**NorMistral-7b-warm**](https://huggingface.co/norallm/normistral-7b-warm) -- an LLM initialized from [Mistral-7b-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and continuously pretrained on Norwegian data;
 - [**NorMistral-7b-scratch**](https://huggingface.co/norallm/normistral-7b-scratch) -- a Mistral-based LLM pretrained from scratch on Norwegian data;
 - [**NorBLOOM-7b-scratch**](https://huggingface.co/norallm/NorBLOOM-7b-scratch) -- a BLOOM-based LLM pretrained from scratch on Norwegian data.

 NorBLOOM-7b-scratch is a large Norwegian language model pretrained from scratch on a total of 260 billion subword tokens (using six repetitions of open Norwegian texts).
+This model is a part of the NORA.LLM family developed in collaboration between [the Language Technology Group at the University of Oslo](https://huggingface.co/ltg), [the High Performance Language Technologies (HPLT) project](https://hplt-project.org/), [the National Library of Norway](https://huggingface.co/NbAiLab), and [the University of Turku](https://huggingface.co/TurkuNLP).
 All the models are pre-trained on the same dataset and with the same tokenizer.
 NorBLOOM-7b-scratch has around 7 billion parameters and is based on [the BLOOM architecture](https://arxiv.org/abs/2211.05100).
+The NORA.LLM language model family includes (as of now):
 - [**NorMistral-7b-warm**](https://huggingface.co/norallm/normistral-7b-warm) -- an LLM initialized from [Mistral-7b-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) and continuously pretrained on Norwegian data;
 - [**NorMistral-7b-scratch**](https://huggingface.co/norallm/normistral-7b-scratch) -- a Mistral-based LLM pretrained from scratch on Norwegian data;
 - [**NorBLOOM-7b-scratch**](https://huggingface.co/norallm/NorBLOOM-7b-scratch) -- a BLOOM-based LLM pretrained from scratch on Norwegian data.