NeMo
PyTorch
English
seq2seq
masked language modeling
MaximumEntropy commited on
Commit
95c9255
1 Parent(s): ac7e97e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +112 -0
README.md CHANGED
@@ -1,3 +1,115 @@
1
  ---
 
 
 
 
 
 
 
 
 
2
  license: cc-by-4.0
 
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
+ library_name: nemo
5
+ datasets:
6
+ - the_pile
7
+ tags:
8
+ - text generation
9
+ - pytorch
10
+ - causal-lm
11
  license: cc-by-4.0
12
+
13
  ---
14
+ # NeMo Megatron-T5 3B
15
+
16
+ <style>
17
+ img {
18
+ display: inline;
19
+ }
20
+ </style>
21
+
22
+ |[![Model architecture](https://img.shields.io/badge/Model%20Arch-Transformer%20Decoder-green)](#model-architecture)|[![Model size](https://img.shields.io/badge/Params-3B-green)](#model-architecture)|[![Language](https://img.shields.io/badge/Language-en--US-lightgrey#model-badge)](#datasets)
23
+
24
+
25
+ ## Model Description
26
+
27
+ NeMo Megatron-T5 3B is a transformer-based masked language model. [T5](https://arxiv.org/abs/1910.10683) [1] is a class of encoder-decoder models trained with a span-based masked language modeling objective. We follow the [T5v1.1](https://huggingface.co/docs/transformers/model_doc/t5v1.1) approach of pre-training using only the masked language modeling objective. It has Tensor Parallelism (TP) of 2, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
28
+
29
+ This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
30
+
31
+ ## Getting started
32
+
33
+ ### Step 1: Install NeMo and dependencies
34
+
35
+ You will need to install NVIDIA Apex and NeMo.
36
+
37
+ ```
38
+ git clone https://github.com/ericharper/apex.git
39
+ cd apex
40
+ git checkout nm_v1.11.0
41
+ pip install -v --disable-pip-version-check --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" --global-option="--fast_layer_norm" --global-option="--distributed_adam" --global-option="--deprecated_fused_adam" ./
42
+ ```
43
+
44
+ ```
45
+ pip install nemo_toolkit['nlp']==1.11.0
46
+ ```
47
+
48
+ Alternatively, you can use NeMo Megatron training docker container with all dependencies pre-installed - [https://developer.nvidia.com/nemo-megatron-open-beta?nvid=nv-int-tblg-249896](https://developer.nvidia.com/nemo-megatron-open-beta?nvid=nv-int-tblg-249896)
49
+
50
+ ### Step 2: Run inference
51
+
52
+ **Note.** The model has been trained with Tensor Parallelism (TP) of 2 and Pipeline Parallelism (PP) of 1, but it should be possible to run inference with tensor parallel size 1 on most NVIDIA GPUs
53
+
54
+ ```
55
+ git clone https://github.com/NVIDIA/NeMo.git
56
+ cd NeMo/examples/nlp/language_modeling
57
+ git checkout v1.11.0
58
+ python megatron_t5_eval.py \
59
+ --model_file /raid/Data/NMT/Models/t5_3b/megatron_t5-tp2--val_los-1.09-step-999999-consumed-samples-2159846144.0.nemo \
60
+ --prompt '<mask> was the first person to set foot on the moon. When he did, he uttered the phrase <mask> for man, one <mask> for mankind which is still a popular quote today.' \
61
+ --tensor_model_parallel_size 2
62
+ ```
63
+
64
+ The script will automatically replace all <mask> tokens with the appropriate sentinel tokens used while pre-training and attempt to fill them in autoregressively with greedy decoding.
65
+
66
+
67
+ *Expected Response*:
68
+
69
+ ```
70
+ {
71
+ 'prompt': '<mask> was the first person to set foot on the moon. When he did, he uttered the phrase <mask> for man, one <mask> for mankind which is still a popular quote today.',
72
+ 'completion':
73
+ {
74
+ 'text': '[CLS] <extra_id_0> Neil Armstrong <extra_id_1> one small step <extra_id_2> giant leap',
75
+ 'tokens': [(101, '[CLS]', -2.9802276912960224e-06), (28996, '<extra_id_0>', -0.1492447555065155), (6003, 'Neil', -0.0015669699059799314), (8800, 'Armstrong', -0.013404252007603645), (28997, '<extra_id_1>', -0.9019092917442322), (1141, 'one', -0.7962003350257874), (1353, 'small', -0.006306509021669626), (2585, 'step', -1.9073468138230965e-06), (28998, '<extra_id_2>', -0.0026884861290454865), (4994, 'giant', -0.1679367572069168), (13660, 'leap', -5.960462772236497e-07)]
76
+ },
77
+ 'masked_input': '<extra_id_0> was the first person to set foot on the moon . When he did , he uttered the phrase <extra_id_1> for man , one <extra_id_2> for mankind which is still a popular quote today .'
78
+ }
79
+ ```
80
+
81
+ - prompt: The provided raw prompt as input
82
+ - completion:
83
+ - text: The final generated text from the model along with special/sentinel tokens besides "</s>"
84
+ - tokens: Each individual subword that is generated along with its log-probability.
85
+ - masked_input: The original raw prompt with <mask> replaced with appropriate sentinel tokens.
86
+
87
+ ## Training Data
88
+
89
+ The model was trained on ["The Pile" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/). [4]
90
+
91
+ ## Evaluation results
92
+
93
+ *Fine-tuned Performance* on downstream *validation* sets for different tasks
94
+
95
+ | MNLI-M | MNLI-MM | SST-2 | STS-B (Spearman) |
96
+ | -------| --------| ------| -----------------|
97
+ | 90.62 | 90.61 | 97.2 | 91.5 |
98
+
99
+ ## Limitations
100
+
101
+ The model was trained on the data originally crawled from the Internet. This data contains toxic language and societal biases. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts.
102
+
103
+ ## References
104
+
105
+ [1] [Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer](https://arxiv.org/abs/1910.10683)
106
+
107
+ [2] [Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism](https://arxiv.org/pdf/1909.08053.pdf)
108
+
109
+ [3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
110
+
111
+ [4] [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
112
+
113
+ ## Licence
114
+
115
+ License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.