javi8979 commited on
Commit
f613fd5
1 Parent(s): d174b12

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -37,7 +37,7 @@ This is the model card of Plume (**P**arallel **L**ang**u**age **M**od**e**l) wi
37
 
38
  ## Summary
39
 
40
- Plume is the first LLM trained for Neural Machine Translation with only parallel Catalan-Centric data from scratch. It is a language model with the same architecture as Gemma 2B. The model is trained for general translation tasks at sentence level. For more information about training, architecture and interpretability of the model check out the paper; "Investigating the translation capabilities of Large Language Models trained on parallel data only". The preprint is available on [arXiv]().
41
 
42
  - **Developed by:** The Language Technologies Unit from Barcelona Supercomputing Center (BSC).
43
  - **Languages:** Spanish, French, Italian, Portuguese, Galician, German, English, and Basque.
@@ -47,7 +47,7 @@ Plume is the first LLM trained for Neural Machine Translation with only parallel
47
 
48
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
49
 
50
- For more details regarding the model architecture, the dataset and model interpretability take a look at the paper which is available on [arXiv](https://arxiv.org/abs/2406.09140).
51
 
52
  ## Intended Uses and Limitations
53
 
@@ -96,11 +96,11 @@ For training, the learning rate is warmed up from 1e-7 to a maximum of 3e-4 over
96
  | Warmup Steps | 2000 |
97
 
98
 
99
- More training details are specified in the [paper](). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
100
 
101
  ## Evaluation
102
 
103
- Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper]().
104
 
105
  | Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
106
  |----------------------|-------------|--------------|------------|-------------|
 
37
 
38
  ## Summary
39
 
40
+ Plume is the first LLM trained for Neural Machine Translation with only parallel Catalan-Centric data from scratch. It is a language model with the same architecture as Gemma 2B. The model is trained for general translation tasks at sentence level. For more information about training, architecture and interpretability of the model check out the paper; "Investigating the translation capabilities of Large Language Models trained on parallel data only". The preprint is available on [arXiv](https://arxiv.org/abs/2406.09140).
41
 
42
  - **Developed by:** The Language Technologies Unit from Barcelona Supercomputing Center (BSC).
43
  - **Languages:** Spanish, French, Italian, Portuguese, Galician, German, English, and Basque.
 
47
 
48
  In recent years, Large Language Models (LLMs) have demonstrated exceptional proficiency across a broad spectrum of Natural Language Processing (NLP) tasks, including Machine Translation. However, previous methodologies predominantly relied on iterative processes such as instruction fine-tuning or continual pre-training, leaving unexplored the challenges of training LLMs solely on parallel data. In this work, we introduce Plume (**P**arallel **L**ang**u**age **M**od**e**l), a collection of three 2B LLMs featuring varying vocabulary sizes (32k, 128k, and 256k) trained exclusively on Catalan-centric parallel examples. These models perform comparable to previous encoder-decoder architectures on 16 supervised translation directions and 56 zero-shot ones.
49
 
50
+ For more details regarding the model architecture, the dataset and model interpretability take a look at the [paper](https://arxiv.org/abs/2406.09140).
51
 
52
  ## Intended Uses and Limitations
53
 
 
96
  | Warmup Steps | 2000 |
97
 
98
 
99
+ More training details are specified in the [paper](https://arxiv.org/abs/2406.09140). Code for training the model and running other experiments can be found in our [GitHub repository](https://github.com/projecte-aina/Plume).
100
 
101
  ## Evaluation
102
 
103
+ Below are the evaluation results on Flores-200 and NTREX for supervised MT directions. For more details about model evaluation check out the [paper](https://arxiv.org/abs/2406.09140).
104
 
105
  | Model | FLORES BLEU | FLORES COMET | NTREX BLEU | NTREX COMET |
106
  |----------------------|-------------|--------------|------------|-------------|