Update README.md
Browse files
README.md
CHANGED
@@ -24,7 +24,7 @@ img {
|
|
24 |
|
25 |
## Model Description
|
26 |
|
27 |
-
Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2].
|
28 |
|
29 |
This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
|
30 |
|
@@ -95,17 +95,15 @@ print(sentences)
|
|
95 |
|
96 |
## Training Data
|
97 |
|
98 |
-
The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/).
|
99 |
|
100 |
## Evaluation results
|
101 |
|
102 |
-
*Zero-shot performance.*
|
103 |
|
104 |
| ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
|
105 |
| ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
|
106 |
-
| 0.3012 | 0.4596 | 0.459 | 0.
|
107 |
-
|
108 |
-
|
109 |
|
110 |
## References
|
111 |
|
@@ -115,6 +113,8 @@ The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://p
|
|
115 |
|
116 |
[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
117 |
|
|
|
|
|
118 |
## Licence
|
119 |
|
120 |
License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
|
|
|
24 |
|
25 |
## Model Description
|
26 |
|
27 |
+
Megatron-GPT 1.3B is a transformer-based language model. GPT refers to a class of transformer decoder-only models similar to GPT-2 and 3 while 1.3B refers to the total trainable parameter count (1.3 Billion) [1, 2]. It has Tensor Parallelism (TP) of 1, Pipeline Parallelism (PP) of 1 and should fit on a single NVIDIA GPU.
|
28 |
|
29 |
This model was trained with [NeMo Megatron](https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/nlp/nemo_megatron/intro.html).
|
30 |
|
|
|
95 |
|
96 |
## Training Data
|
97 |
|
98 |
+
The model was trained on ["The Piles" dataset prepared by Eleuther.AI](https://pile.eleuther.ai/). [4]
|
99 |
|
100 |
## Evaluation results
|
101 |
|
102 |
+
*Zero-shot performance.* Evaluated using [LM Evaluation Test Suite from AI21](https://github.com/AI21Labs/lm-evaluation)
|
103 |
|
104 |
| ARC-Challenge | ARC-Easy | RACE-middle | RACE-high | Winogrande | RTE | BoolQA | HellaSwag | PiQA |
|
105 |
| ------------- | -------- | ----------- | --------- | ---------- | --- | ------ | --------- | ---- |
|
106 |
+
| 0.3012 | 0.4596 | 0.459 | 0.3797 | 0.5343 | 0.5451 | 0.5979 | 0.4443 | 0.6834 |
|
|
|
|
|
107 |
|
108 |
## References
|
109 |
|
|
|
113 |
|
114 |
[3] [NVIDIA NeMo Toolkit](https://github.com/NVIDIA/NeMo)
|
115 |
|
116 |
+
[4] [The Pile: An 800GB Dataset of Diverse Text for Language Modeling](https://arxiv.org/abs/2101.00027)
|
117 |
+
|
118 |
## Licence
|
119 |
|
120 |
License to use this model is covered by the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/). By downloading the public and release version of the model, you accept the terms and conditions of the [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/) license.
|