nicholasKluge commited on
Commit
39282c7
1 Parent(s): 444939e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -13
README.md CHANGED
@@ -60,15 +60,16 @@ Also, these models were trained by leveraging [scaling laws](https://arxiv.org/a
60
 
61
  This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model. The main libraries used are:
62
 
63
- - Transformers
64
- - PyTorch
65
- - Datasets
66
- - Tokenizers
67
- - Accelerate codecarbon sentencepiece
 
68
 
 
69
 
70
-
71
- ## Training Set-up
72
 
73
  | Arguments | Value |
74
  |-------------------------------|--------------------------------------|
@@ -143,9 +144,15 @@ for i, completion in enumerate(completions):
143
 
144
  ## Limitations
145
 
146
- 🤥 Generative AI models, like LLMs used for text generation/conversation or GANs for image generation, can produce content that can be mistaken for truth but is, in fact, misleading or entirely false, given the model's tendency to output hallucinations. Such models can generate deceptive visuals, human-like textual content, music, or combined media that might seem genuine at first glance.
 
 
 
 
 
 
147
 
148
- 🤬 Machine learning systems can inherit social and historical stereotypes from the data used to train them. Given these biases, models can be prone to produce toxic content, that is, text, images, videos, or comments, that is harmful, offensive, or detrimental to individuals, groups, or communities. Also, models that automate decision-making can have biases against certain groups, affecting people based on sensitive attributes in an unjust manner.
149
 
150
  ## Evaluations
151
 
@@ -160,7 +167,7 @@ for i, completion in enumerate(completions):
160
 
161
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
162
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
163
- | [Teeny Tiny Llama 162m](https://huggingface.co/nicholasKluge/Teeny-tiny-llama-162m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
164
  | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped) | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
165
  | [OPT-125m](https://huggingface.co/facebook/opt-125m) | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
166
  | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
@@ -168,15 +175,26 @@ for i, completion in enumerate(completions):
168
 
169
  * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
170
 
 
 
 
 
 
 
 
 
 
 
 
171
  ## Cite as 🤗
172
 
173
  ```latex
174
 
175
  @misc{nicholas22llama,
176
  doi = {10.5281/zenodo.6989727},
177
- url = {https://huggingface.co/nicholasKluge/Teeny-tiny-llama-162m},
178
  author = {Nicholas Kluge Corrêa},
179
- title = {Teeny-tiny-llama},
180
  year = {2023},
181
  publisher = {HuggingFace},
182
  journal = {HuggingFace repository},
@@ -186,4 +204,4 @@ for i, completion in enumerate(completions):
186
 
187
  ## License
188
 
189
- The Teeny-tiny-llama-162m is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.
 
60
 
61
  This repository has the [source code](https://github.com/Nkluge-correa/Aira) used to train this model. The main libraries used are:
62
 
63
+ - [Transformers](https://github.com/huggingface/transformers)
64
+ - [PyTorch](https://github.com/pytorch/pytorch)
65
+ - [Datasets](https://github.com/huggingface/datasets)
66
+ - [Tokenizers](https://github.com/huggingface/tokenizers)
67
+ - [Accelerate](https://github.com/huggingface/accelerate)
68
+ - [Codecarbon](https://github.com/mlco2/codecarbon)
69
 
70
+ - ## Training Set-up
71
 
72
+ These are the main arguments used in the training of this model:
 
73
 
74
  | Arguments | Value |
75
  |-------------------------------|--------------------------------------|
 
144
 
145
  ## Limitations
146
 
147
+ - **Hallucinations:** This model can produce content that can be mistaken for truth but is, in fact, misleading or entirely false, i.e., hallucination.
148
+
149
+ - **Biases and Toxicity:** This model inherits the social and historical stereotypes from the data used to train it. Given these biases, the model can produce toxic content, i.e., harmful, offensive, or detrimental to individuals, groups, or communities.
150
+
151
+ - **Unreliable Code:** The model may produce incorrect code snippets and statements. These code generations should not be treated as suggestions or accurate solutions.
152
+
153
+ - **Language Limitations:** The model is primarily designed to understand standard Portuguese (BR). Other languages might challenge its comprehension, leading to potential misinterpretations or errors in response.
154
 
155
+ - **Repetition and Verbosity:** The model may get stuck on repetition loops (especially if the repetition penalty during generations is set to a meager value) or produce verbose responses unrelated to the prompt it was given.
156
 
157
  ## Evaluations
158
 
 
167
 
168
  | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
169
  |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
170
+ | [TeenyTinyLlama-162m](https://huggingface.co/nicholasKluge/TeenyTinyLlama-162m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
171
  | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped) | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
172
  | [OPT-125m](https://huggingface.co/facebook/opt-125m) | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
173
  | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
 
175
 
176
  * Evaluations on benchmarks were performed using the [Language Model Evaluation Harness](https://github.com/EleutherAI/lm-evaluation-harness) (by [EleutherAI](https://www.eleuther.ai/)). Thanks to [Laiviet](https://github.com/laiviet/lm-evaluation-harness) for translating some of the tasks in the LM-Evaluation-Harness.
177
 
178
+ ## Fine Tuning
179
+
180
+ | Models | Average | [ARC](https://arxiv.org/abs/1803.05457) | [Hellaswag](https://arxiv.org/abs/1905.07830) | [MMLU](https://arxiv.org/abs/2009.03300) | [TruthfulQA](https://arxiv.org/abs/2109.07958) |
181
+ |-------------------------------------------------------------------------------------|---------|-----------------------------------------|-----------------------------------------------|------------------------------------------|------------------------------------------------|
182
+ | [Teeny Tiny Llama 162m](https://huggingface.co/nicholasKluge/Teeny-tiny-llama-162m) | 31.16 | 26.15 | 29.29 | 28.11 | 41.12 |
183
+ | [Pythia-160m](https://huggingface.co/EleutherAI/pythia-160m-deduped) | 31.16 | 24.06 | 31.39 | 24.86 | 44.34 |
184
+ | [OPT-125m](https://huggingface.co/facebook/opt-125m) | 30.80 | 22.87 | 31.47 | 26.02 | 42.87 |
185
+ | [Gpt2-portuguese-small](https://huggingface.co/pierreguillou/gpt2-small-portuguese) | 30.22 | 22.48 | 29.62 | 27.36 | 41.44 |
186
+ | [Gpt2-small](https://huggingface.co/gpt2) | 29.97 | 21.48 | 31.60 | 25.79 | 40.65 |
187
+
188
+
189
  ## Cite as 🤗
190
 
191
  ```latex
192
 
193
  @misc{nicholas22llama,
194
  doi = {10.5281/zenodo.6989727},
195
+ url = {https://huggingface.co/nicholasKluge/TeenyTinyLlama-162m},
196
  author = {Nicholas Kluge Corrêa},
197
+ title = {Teeny Tiny Llama},
198
  year = {2023},
199
  publisher = {HuggingFace},
200
  journal = {HuggingFace repository},
 
204
 
205
  ## License
206
 
207
+ The TeenyTinyLlama-162m is licensed under the Apache License, Version 2.0. See the [LICENSE](LICENSE) file for more details.