add model
Browse files- README.md +24 -108
- config.json +2 -1
- tf_model.h5 +2 -2
README.md
CHANGED
@@ -1,131 +1,47 @@
|
|
1 |
---
|
2 |
-
language: en
|
3 |
-
inference: false
|
4 |
tags:
|
5 |
-
-
|
6 |
-
-
|
7 |
-
|
8 |
-
|
9 |
-
commercial: false
|
10 |
---
|
11 |
|
12 |
-
|
13 |
-
|
14 |
-
OPT was first introduced in [Open Pre-trained Transformer Language Models](https://arxiv.org/abs/2205.01068) and first released in [metaseq's repository](https://github.com/facebookresearch/metaseq) on May 3rd 2022 by Meta AI.
|
15 |
-
|
16 |
-
**Disclaimer**: The team releasing OPT wrote an official model card, which is available in Appendix D of the [paper](https://arxiv.org/pdf/2205.01068.pdf).
|
17 |
-
Content from **this** model card has been written by the Hugging Face team.
|
18 |
|
19 |
-
|
20 |
|
21 |
-
|
|
|
22 |
|
23 |
-
> Large language models trained on massive text collections have shown surprising emergent
|
24 |
-
> capabilities to generate text and perform zero- and few-shot learning. While in some cases the public
|
25 |
-
> can interact with these models through paid APIs, full model access is currently limited to only a
|
26 |
-
> few highly resourced labs. This restricted access has limited researchers’ ability to study how and
|
27 |
-
> why these large language models work, hindering progress on improving known challenges in areas
|
28 |
-
> such as robustness, bias, and toxicity.
|
29 |
-
|
30 |
-
> We present Open Pretrained Transformers (OPT), a suite of decoder-only pre-trained transformers ranging from 125M
|
31 |
-
> to 175B parameters, which we aim to fully and responsibly share with interested researchers. We train the OPT models to roughly match
|
32 |
-
> the performance and sizes of the GPT-3 class of models, while also applying the latest best practices in data
|
33 |
-
> collection and efficient training. Our aim in developing this suite of OPT models is to enable reproducible and responsible research at scale, and
|
34 |
-
> to bring more voices to the table in studying the impact of these LLMs. Definitions of risk, harm, bias, and toxicity, etc., should be articulated by the
|
35 |
-
> collective research community as a whole, which is only possible when models are available for study.
|
36 |
|
37 |
## Model description
|
38 |
|
39 |
-
|
40 |
-
OPT belongs to the same family of decoder-only models like [GPT-3](https://arxiv.org/abs/2005.14165). As such, it was pretrained using the self-supervised causal language modedling objective.
|
41 |
|
42 |
-
For evaluation, OPT follows [GPT-3](https://arxiv.org/abs/2005.14165) by using their prompts and overall experimental setup. For more details, please read
|
43 |
-
the [official paper](https://arxiv.org/abs/2205.01068).
|
44 |
## Intended uses & limitations
|
45 |
|
46 |
-
|
47 |
-
In addition, the model can be fine-tuned on a downstream task using the [CLM example](https://github.com/huggingface/transformers/tree/main/examples/pytorch/language-modeling). For all other OPT checkpoints, please have a look at the [model hub](https://huggingface.co/models?filter=opt).
|
48 |
-
|
49 |
-
### How to use
|
50 |
-
|
51 |
-
You can use this model directly with a pipeline for text generation.
|
52 |
-
|
53 |
-
```python
|
54 |
-
>>> from transformers import pipeline
|
55 |
-
|
56 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-125m")
|
57 |
-
>>> generator("Hello, I'm am conscious and")
|
58 |
-
[{'generated_text': "Hello, I'm am conscious and conscious :) :) Anyway��極��極��極��極��極��極��極��極��極"}]
|
59 |
-
```
|
60 |
-
|
61 |
-
By default, generation is deterministic. In order to use the top-k sampling, please set `do_sample` to `True`.
|
62 |
-
|
63 |
-
```python
|
64 |
-
>>> from transformers import pipeline, set_seed
|
65 |
|
66 |
-
|
67 |
-
>>> generator = pipeline('text-generation', model="facebook/opt-125m", do_sample=True)
|
68 |
-
>>> generator("Hello, I'm am conscious and")
|
69 |
-
[{'generated_text': "Hello, I'm am conscious and active observer!! HmmregorCLASSIFIEDドラゴン覚醒ドラゴンドラゴン覚醒覚醒ドラゴン"}]
|
70 |
-
```
|
71 |
|
72 |
-
|
73 |
|
74 |
-
|
75 |
-
unfiltered content from the internet, which is far from neutral the model is strongly biased :
|
76 |
-
|
77 |
-
> Like other large language models for which the diversity (or lack thereof) of training
|
78 |
-
> data induces downstream impact on the quality of our model, OPT-175B has limitations in terms
|
79 |
-
> of bias and safety. OPT-175B can also have quality issues in terms of generation diversity and
|
80 |
-
> hallucination. In general, OPT-175B is not immune from the plethora of issues that plague modern
|
81 |
-
> large language models.
|
82 |
-
|
83 |
-
This bias will also affect all fine-tuned versions of this model.
|
84 |
-
|
85 |
-
## Training data
|
86 |
-
|
87 |
-
The Meta AI team wanted to train this model on a corpus as large as possible. It is composed of the union of the following 5 filtered datasets of textual documents:
|
88 |
-
|
89 |
-
- BookCorpus, which consists of more than 10K unpublished books,
|
90 |
-
- CC-Stories, which contains a subset of CommonCrawl data filtered to match the
|
91 |
-
story-like style of Winograd schemas,
|
92 |
-
- The Pile, from which * Pile-CC, OpenWebText2, USPTO, Project Gutenberg, OpenSubtitles, Wikipedia, DM Mathematics and HackerNews* were included.
|
93 |
-
- Pushshift.io Reddit dataset that was developed in Baumgartner et al. (2020) and processed in
|
94 |
-
Roller et al. (2021)
|
95 |
-
- CCNewsV2 containing an updated version of the English portion of the CommonCrawl News
|
96 |
-
dataset that was used in RoBERTa (Liu et al., 2019b)
|
97 |
-
|
98 |
-
The final training data contains 180B tokens corresponding to 800GB of data. The validation split was made of 200MB of the pretraining data, sampled proportionally
|
99 |
-
to each dataset’s size in the pretraining corpus.
|
100 |
-
|
101 |
-
The dataset might contains offensive content as parts of the dataset are a subset of
|
102 |
-
public Common Crawl data, along with a subset of public Reddit data, which could contain sentences
|
103 |
-
that, if viewed directly, can be insulting, threatening, or might otherwise cause anxiety.
|
104 |
-
|
105 |
-
### Collection process
|
106 |
|
107 |
-
|
108 |
-
re-formatting practices, including removing repetitive/non-informative text like *Chapter One* or
|
109 |
-
*This ebook by Project Gutenberg.*
|
110 |
|
111 |
-
|
|
|
|
|
112 |
|
113 |
-
###
|
114 |
|
115 |
-
The texts are tokenized using the **GPT2** byte-level version of Byte Pair Encoding (BPE) (for unicode characters) and a
|
116 |
-
vocabulary size of 50272. The inputs are sequences of 2048 consecutive tokens.
|
117 |
|
118 |
-
The 175B model was trained on 992 *80GB A100 GPUs*. The training duration was roughly ~33 days of continuous training.
|
119 |
|
120 |
-
###
|
121 |
|
122 |
-
|
123 |
-
|
124 |
-
|
125 |
-
|
126 |
-
year={2022},
|
127 |
-
eprint={2205.01068},
|
128 |
-
archivePrefix={arXiv},
|
129 |
-
primaryClass={cs.CL}
|
130 |
-
}
|
131 |
-
```
|
|
|
1 |
---
|
|
|
|
|
2 |
tags:
|
3 |
+
- generated_from_keras_callback
|
4 |
+
model-index:
|
5 |
+
- name: opt-125m
|
6 |
+
results: []
|
|
|
7 |
---
|
8 |
|
9 |
+
<!-- This model card has been generated automatically according to the information Keras had access to. You should
|
10 |
+
probably proofread and complete it, then remove this comment. -->
|
|
|
|
|
|
|
|
|
11 |
|
12 |
+
# opt-125m
|
13 |
|
14 |
+
This model was trained from scratch on an unknown dataset.
|
15 |
+
It achieves the following results on the evaluation set:
|
16 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
## Model description
|
19 |
|
20 |
+
More information needed
|
|
|
21 |
|
|
|
|
|
22 |
## Intended uses & limitations
|
23 |
|
24 |
+
More information needed
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
25 |
|
26 |
+
## Training and evaluation data
|
|
|
|
|
|
|
|
|
27 |
|
28 |
+
More information needed
|
29 |
|
30 |
+
## Training procedure
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
31 |
|
32 |
+
### Training hyperparameters
|
|
|
|
|
33 |
|
34 |
+
The following hyperparameters were used during training:
|
35 |
+
- optimizer: None
|
36 |
+
- training_precision: float32
|
37 |
|
38 |
+
### Training results
|
39 |
|
|
|
|
|
40 |
|
|
|
41 |
|
42 |
+
### Framework versions
|
43 |
|
44 |
+
- Transformers 4.20.0.dev0
|
45 |
+
- TensorFlow 2.9.1
|
46 |
+
- Datasets 2.2.2
|
47 |
+
- Tokenizers 0.12.1
|
|
|
|
|
|
|
|
|
|
|
|
config.json
CHANGED
@@ -1,4 +1,5 @@
|
|
1 |
{
|
|
|
2 |
"activation_dropout": 0.0,
|
3 |
"activation_function": "relu",
|
4 |
"architectures": [
|
@@ -19,7 +20,7 @@
|
|
19 |
"num_hidden_layers": 12,
|
20 |
"pad_token_id": 1,
|
21 |
"prefix": "</s>",
|
22 |
-
"torch_dtype": "
|
23 |
"transformers_version": "4.20.0.dev0",
|
24 |
"use_cache": true,
|
25 |
"vocab_size": 50272,
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "facebook/opt-125m",
|
3 |
"activation_dropout": 0.0,
|
4 |
"activation_function": "relu",
|
5 |
"architectures": [
|
|
|
20 |
"num_hidden_layers": 12,
|
21 |
"pad_token_id": 1,
|
22 |
"prefix": "</s>",
|
23 |
+
"torch_dtype": "float32",
|
24 |
"transformers_version": "4.20.0.dev0",
|
25 |
"use_cache": true,
|
26 |
"vocab_size": 50272,
|
tf_model.h5
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
-
size
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:96ccfd6e25b876e8c469eacb2139e745c69a8757ca7d7ed2e8ea0b83a1da6764
|
3 |
+
size 501162056
|