ybelkada commited on
Commit
e347481
1 Parent(s): 8440fd6

add first files

Browse files
README.md ADDED
@@ -0,0 +1,183 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ - fr
5
+ - ro
6
+ - de
7
+ datasets:
8
+ - c4
9
+ tags:
10
+ - summarization
11
+ - translation
12
+
13
+ license: apache-2.0
14
+ inference: false
15
+ ---
16
+
17
+ # Model Card for T5 11B
18
+
19
+ ![model image](https://camo.githubusercontent.com/623b4dea0b653f2ad3f36c71ebfe749a677ac0a1/68747470733a2f2f6d69726f2e6d656469756d2e636f6d2f6d61782f343030362f312a44304a31674e51663876727255704b657944387750412e706e67)
20
+
21
+ # Table of Contents
22
+
23
+ 1. [Model Details](#model-details)
24
+ 2. [Uses](#uses)
25
+ 3. [Bias, Risks, and Limitations](#bias-risks-and-limitations)
26
+ 4. [Training Details](#training-details)
27
+ 5. [Evaluation](#evaluation)
28
+ 6. [Environmental Impact](#environmental-impact)
29
+ 7. [Citation](#citation)
30
+ 8. [Model Card Authors](#model-card-authors)
31
+ 9. [How To Get Started With the Model](#how-to-get-started-with-the-model)
32
+
33
+ # Model Details
34
+
35
+ ## Model Description
36
+
37
+ The developers of the Text-To-Text Transfer Transformer (T5) [write](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html):
38
+
39
+ > With T5, we propose reframing all NLP tasks into a unified text-to-text-format where the input and output are always text strings, in contrast to BERT-style models that can only output either a class label or a span of the input. Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task.
40
+
41
+ T5-11B is the checkpoint with 11 billion parameters.
42
+
43
+ - **Developed by:** Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu. See [associated paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) and [GitHub repo](https://github.com/google-research/text-to-text-transfer-transformer#released-model-checkpoints)
44
+ - **Model type:** Language model
45
+ - **Language(s) (NLP):** English, French, Romanian, German
46
+ - **License:** Apache 2.0
47
+ - **Related Models:** [All T5 Checkpoints](https://huggingface.co/models?search=t5)
48
+ - **Resources for more information:**
49
+ - [Research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf)
50
+ - [Google's T5 Blog Post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html)
51
+ - [GitHub Repo](https://github.com/google-research/text-to-text-transfer-transformer)
52
+ - [Hugging Face T5 Docs](https://huggingface.co/docs/transformers/model_doc/t5)
53
+
54
+ # Uses
55
+
56
+ ## Direct Use and Downstream Use
57
+
58
+ The developers write in a [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) that the model:
59
+
60
+ > Our text-to-text framework allows us to use the same model, loss function, and hyperparameters on any NLP task, including machine translation, document summarization, question answering, and classification tasks (e.g., sentiment analysis). We can even apply T5 to regression tasks by training it to predict the string representation of a number instead of the number itself.
61
+
62
+ See the [blog post](https://ai.googleblog.com/2020/02/exploring-transfer-learning-with-t5.html) and [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.
63
+
64
+ ## Out-of-Scope Use
65
+
66
+ More information needed.
67
+
68
+ # Bias, Risks, and Limitations
69
+
70
+ More information needed.
71
+
72
+ ## Recommendations
73
+
74
+ More information needed.
75
+
76
+ # Training Details
77
+
78
+ ## Training Data
79
+
80
+ The model is pre-trained on the [Colossal Clean Crawled Corpus (C4)](https://www.tensorflow.org/datasets/catalog/c4), which was developed and released in the context of the same [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) as T5.
81
+
82
+ The model was pre-trained on a on a **multi-task mixture of unsupervised (1.) and supervised tasks (2.)**.
83
+ Thereby, the following datasets were being used for (1.) and (2.):
84
+
85
+ 1. **Datasets used for Unsupervised denoising objective**:
86
+
87
+ - [C4](https://huggingface.co/datasets/c4)
88
+ - [Wiki-DPR](https://huggingface.co/datasets/wiki_dpr)
89
+
90
+
91
+ 2. **Datasets used for Supervised text-to-text language modeling objective**
92
+
93
+ - Sentence acceptability judgment
94
+ - CoLA [Warstadt et al., 2018](https://arxiv.org/abs/1805.12471)
95
+ - Sentiment analysis
96
+ - SST-2 [Socher et al., 2013](https://nlp.stanford.edu/~socherr/EMNLP2013_RNTN.pdf)
97
+ - Paraphrasing/sentence similarity
98
+ - MRPC [Dolan and Brockett, 2005](https://aclanthology.org/I05-5002)
99
+ - STS-B [Ceret al., 2017](https://arxiv.org/abs/1708.00055)
100
+ - QQP [Iyer et al., 2017](https://quoradata.quora.com/First-Quora-Dataset-Release-Question-Pairs)
101
+ - Natural language inference
102
+ - MNLI [Williams et al., 2017](https://arxiv.org/abs/1704.05426)
103
+ - QNLI [Rajpurkar et al.,2016](https://arxiv.org/abs/1606.05250)
104
+ - RTE [Dagan et al., 2005](https://link.springer.com/chapter/10.1007/11736790_9)
105
+ - CB [De Marneff et al., 2019](https://semanticsarchive.net/Archive/Tg3ZGI2M/Marneffe.pdf)
106
+ - Sentence completion
107
+ - COPA [Roemmele et al., 2011](https://www.researchgate.net/publication/221251392_Choice_of_Plausible_Alternatives_An_Evaluation_of_Commonsense_Causal_Reasoning)
108
+ - Word sense disambiguation
109
+ - WIC [Pilehvar and Camacho-Collados, 2018](https://arxiv.org/abs/1808.09121)
110
+ - Question answering
111
+ - MultiRC [Khashabi et al., 2018](https://aclanthology.org/N18-1023)
112
+ - ReCoRD [Zhang et al., 2018](https://arxiv.org/abs/1810.12885)
113
+ - BoolQ [Clark et al., 2019](https://arxiv.org/abs/1905.10044)
114
+
115
+ ## Training Procedure
116
+
117
+ In their [abstract](https://jmlr.org/papers/volume21/20-074/20-074.pdf), the model developers write:
118
+
119
+ > In this paper, we explore the landscape of transfer learning techniques for NLP by introducing a unified framework that converts every language problem into a text-to-text format. Our systematic study compares pre-training objectives, architectures, unlabeled datasets, transfer approaches, and other factors on dozens of language understanding tasks.
120
+
121
+ The framework introduced, the T5 framework, involves a training procedure that brings together the approaches studied in the paper. See the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for further details.
122
+
123
+ # Evaluation
124
+
125
+ ## Testing Data, Factors & Metrics
126
+
127
+ The developers evaluated the model on 24 tasks, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf) for full details.
128
+
129
+ ## Results
130
+
131
+ For full results for T5-11B, see the [research paper](https://jmlr.org/papers/volume21/20-074/20-074.pdf), Table 14.
132
+
133
+ # Environmental Impact
134
+
135
+ Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
136
+
137
+ - **Hardware Type:** Google Cloud TPU Pods
138
+ - **Hours used:** More information needed
139
+ - **Cloud Provider:** GCP
140
+ - **Compute Region:** More information needed
141
+ - **Carbon Emitted:** More information needed
142
+
143
+ # Citation
144
+
145
+ **BibTeX:**
146
+
147
+ ```bibtex
148
+ @article{2020t5,
149
+ author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
150
+ title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
151
+ journal = {Journal of Machine Learning Research},
152
+ year = {2020},
153
+ volume = {21},
154
+ number = {140},
155
+ pages = {1-67},
156
+ url = {http://jmlr.org/papers/v21/20-074.html}
157
+ }
158
+ ```
159
+
160
+ **APA:**
161
+ - Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., ... & Liu, P. J. (2020). Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res., 21(140), 1-67.
162
+
163
+ # Model Card Authors
164
+
165
+ This model card was written by the team at Hugging Face.
166
+
167
+ # How to Get Started with the Model
168
+
169
+ ## Disclaimer
170
+
171
+ **Before `transformers` v3.5.0**, due do its immense size, `t5-11b` required some special treatment.
172
+ If you're using transformers `<= v3.4.0`, `t5-11b` should be loaded with flag `use_cdn` set to `False` as follows:
173
+
174
+ ```python
175
+ t5 = transformers.T5ForConditionalGeneration.from_pretrained('t5-11b', use_cdn = False)
176
+ ```
177
+
178
+ Secondly, a single GPU will most likely not have enough memory to even load the model into memory as the weights alone amount to over 40 GB.
179
+ - Model parallelism has to be used here to overcome this problem as is explained in this [PR](https://github.com/huggingface/transformers/pull/3578).
180
+ - DeepSpeed's ZeRO-Offload is another approach as explained in this [post](https://github.com/huggingface/transformers/issues/9996).
181
+
182
+ See the [Hugging Face T5](https://huggingface.co/docs/transformers/model_doc/t5#transformers.T5Model) docs and a [Colab Notebook](https://colab.research.google.com/github/google-research/text-to-text-transfer-transformer/blob/main/notebooks/t5-trivia.ipynb) created by the model developers for more context.
183
+
config.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "architectures": [
3
+ "T5WithLMHeadModel"
4
+ ],
5
+ "d_ff": 65536,
6
+ "d_kv": 128,
7
+ "d_model": 1024,
8
+ "decoder_start_token_id": 0,
9
+ "dropout_rate": 0.1,
10
+ "eos_token_id": 1,
11
+ "initializer_factor": 1.0,
12
+ "is_encoder_decoder": true,
13
+ "layer_norm_epsilon": 1e-06,
14
+ "model_type": "t5",
15
+ "n_positions": 512,
16
+ "num_heads": 128,
17
+ "num_layers": 24,
18
+ "output_past": true,
19
+ "pad_token_id": 0,
20
+ "relative_attention_num_buckets": 32,
21
+ "task_specific_params": {
22
+ "summarization": {
23
+ "early_stopping": true,
24
+ "length_penalty": 2.0,
25
+ "max_length": 200,
26
+ "min_length": 30,
27
+ "no_repeat_ngram_size": 3,
28
+ "num_beams": 4,
29
+ "prefix": "summarize: "
30
+ },
31
+ "translation_en_to_de": {
32
+ "early_stopping": true,
33
+ "max_length": 300,
34
+ "num_beams": 4,
35
+ "prefix": "translate English to German: "
36
+ },
37
+ "translation_en_to_fr": {
38
+ "early_stopping": true,
39
+ "max_length": 300,
40
+ "num_beams": 4,
41
+ "prefix": "translate English to French: "
42
+ },
43
+ "translation_en_to_ro": {
44
+ "early_stopping": true,
45
+ "max_length": 300,
46
+ "num_beams": 4,
47
+ "prefix": "translate English to Romanian: "
48
+ }
49
+ },
50
+ "vocab_size": 32128
51
+ }
pytorch_model.bin.index.json ADDED
@@ -0,0 +1,517 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 45229301760
4
+ },
5
+ "weight_map": {
6
+ "decoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
7
+ "decoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
8
+ "decoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
9
+ "decoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model_00006-of-00015.bin",
10
+ "decoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
11
+ "decoder.block.0.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
12
+ "decoder.block.0.layer.1.EncDecAttention.k.weight": "pytorch_model_00006-of-00015.bin",
13
+ "decoder.block.0.layer.1.EncDecAttention.o.weight": "pytorch_model_00007-of-00015.bin",
14
+ "decoder.block.0.layer.1.EncDecAttention.q.weight": "pytorch_model_00006-of-00015.bin",
15
+ "decoder.block.0.layer.1.EncDecAttention.relative_attention_bias.weight": "pytorch_model_00007-of-00015.bin",
16
+ "decoder.block.0.layer.1.EncDecAttention.v.weight": "pytorch_model_00006-of-00015.bin",
17
+ "decoder.block.0.layer.1.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
18
+ "decoder.block.0.layer.2.DenseReluDense.wi.weight": "pytorch_model_00007-of-00015.bin",
19
+ "decoder.block.0.layer.2.DenseReluDense.wo.weight": "pytorch_model_00007-of-00015.bin",
20
+ "decoder.block.0.layer.2.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
21
+ "decoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model_00007-of-00015.bin",
22
+ "decoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model_00007-of-00015.bin",
23
+ "decoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model_00007-of-00015.bin",
24
+ "decoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model_00007-of-00015.bin",
25
+ "decoder.block.1.layer.0.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
26
+ "decoder.block.1.layer.1.EncDecAttention.k.weight": "pytorch_model_00007-of-00015.bin",
27
+ "decoder.block.1.layer.1.EncDecAttention.o.weight": "pytorch_model_00007-of-00015.bin",
28
+ "decoder.block.1.layer.1.EncDecAttention.q.weight": "pytorch_model_00007-of-00015.bin",
29
+ "decoder.block.1.layer.1.EncDecAttention.v.weight": "pytorch_model_00007-of-00015.bin",
30
+ "decoder.block.1.layer.1.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
31
+ "decoder.block.1.layer.2.DenseReluDense.wi.weight": "pytorch_model_00007-of-00015.bin",
32
+ "decoder.block.1.layer.2.DenseReluDense.wo.weight": "pytorch_model_00007-of-00015.bin",
33
+ "decoder.block.1.layer.2.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
34
+ "decoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model_00010-of-00015.bin",
35
+ "decoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model_00010-of-00015.bin",
36
+ "decoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model_00010-of-00015.bin",
37
+ "decoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model_00010-of-00015.bin",
38
+ "decoder.block.10.layer.0.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
39
+ "decoder.block.10.layer.1.EncDecAttention.k.weight": "pytorch_model_00010-of-00015.bin",
40
+ "decoder.block.10.layer.1.EncDecAttention.o.weight": "pytorch_model_00010-of-00015.bin",
41
+ "decoder.block.10.layer.1.EncDecAttention.q.weight": "pytorch_model_00010-of-00015.bin",
42
+ "decoder.block.10.layer.1.EncDecAttention.v.weight": "pytorch_model_00010-of-00015.bin",
43
+ "decoder.block.10.layer.1.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
44
+ "decoder.block.10.layer.2.DenseReluDense.wi.weight": "pytorch_model_00010-of-00015.bin",
45
+ "decoder.block.10.layer.2.DenseReluDense.wo.weight": "pytorch_model_00010-of-00015.bin",
46
+ "decoder.block.10.layer.2.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
47
+ "decoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model_00011-of-00015.bin",
48
+ "decoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model_00011-of-00015.bin",
49
+ "decoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model_00011-of-00015.bin",
50
+ "decoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model_00011-of-00015.bin",
51
+ "decoder.block.11.layer.0.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
52
+ "decoder.block.11.layer.1.EncDecAttention.k.weight": "pytorch_model_00011-of-00015.bin",
53
+ "decoder.block.11.layer.1.EncDecAttention.o.weight": "pytorch_model_00011-of-00015.bin",
54
+ "decoder.block.11.layer.1.EncDecAttention.q.weight": "pytorch_model_00011-of-00015.bin",
55
+ "decoder.block.11.layer.1.EncDecAttention.v.weight": "pytorch_model_00011-of-00015.bin",
56
+ "decoder.block.11.layer.1.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
57
+ "decoder.block.11.layer.2.DenseReluDense.wi.weight": "pytorch_model_00011-of-00015.bin",
58
+ "decoder.block.11.layer.2.DenseReluDense.wo.weight": "pytorch_model_00011-of-00015.bin",
59
+ "decoder.block.11.layer.2.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
60
+ "decoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model_00011-of-00015.bin",
61
+ "decoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model_00011-of-00015.bin",
62
+ "decoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model_00011-of-00015.bin",
63
+ "decoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model_00011-of-00015.bin",
64
+ "decoder.block.12.layer.0.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
65
+ "decoder.block.12.layer.1.EncDecAttention.k.weight": "pytorch_model_00011-of-00015.bin",
66
+ "decoder.block.12.layer.1.EncDecAttention.o.weight": "pytorch_model_00011-of-00015.bin",
67
+ "decoder.block.12.layer.1.EncDecAttention.q.weight": "pytorch_model_00011-of-00015.bin",
68
+ "decoder.block.12.layer.1.EncDecAttention.v.weight": "pytorch_model_00011-of-00015.bin",
69
+ "decoder.block.12.layer.1.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
70
+ "decoder.block.12.layer.2.DenseReluDense.wi.weight": "pytorch_model_00011-of-00015.bin",
71
+ "decoder.block.12.layer.2.DenseReluDense.wo.weight": "pytorch_model_00011-of-00015.bin",
72
+ "decoder.block.12.layer.2.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
73
+ "decoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model_00011-of-00015.bin",
74
+ "decoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model_00011-of-00015.bin",
75
+ "decoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model_00011-of-00015.bin",
76
+ "decoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model_00011-of-00015.bin",
77
+ "decoder.block.13.layer.0.layer_norm.weight": "pytorch_model_00011-of-00015.bin",
78
+ "decoder.block.13.layer.1.EncDecAttention.k.weight": "pytorch_model_00011-of-00015.bin",
79
+ "decoder.block.13.layer.1.EncDecAttention.o.weight": "pytorch_model_00012-of-00015.bin",
80
+ "decoder.block.13.layer.1.EncDecAttention.q.weight": "pytorch_model_00011-of-00015.bin",
81
+ "decoder.block.13.layer.1.EncDecAttention.v.weight": "pytorch_model_00011-of-00015.bin",
82
+ "decoder.block.13.layer.1.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
83
+ "decoder.block.13.layer.2.DenseReluDense.wi.weight": "pytorch_model_00012-of-00015.bin",
84
+ "decoder.block.13.layer.2.DenseReluDense.wo.weight": "pytorch_model_00012-of-00015.bin",
85
+ "decoder.block.13.layer.2.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
86
+ "decoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model_00012-of-00015.bin",
87
+ "decoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model_00012-of-00015.bin",
88
+ "decoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model_00012-of-00015.bin",
89
+ "decoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model_00012-of-00015.bin",
90
+ "decoder.block.14.layer.0.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
91
+ "decoder.block.14.layer.1.EncDecAttention.k.weight": "pytorch_model_00012-of-00015.bin",
92
+ "decoder.block.14.layer.1.EncDecAttention.o.weight": "pytorch_model_00012-of-00015.bin",
93
+ "decoder.block.14.layer.1.EncDecAttention.q.weight": "pytorch_model_00012-of-00015.bin",
94
+ "decoder.block.14.layer.1.EncDecAttention.v.weight": "pytorch_model_00012-of-00015.bin",
95
+ "decoder.block.14.layer.1.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
96
+ "decoder.block.14.layer.2.DenseReluDense.wi.weight": "pytorch_model_00012-of-00015.bin",
97
+ "decoder.block.14.layer.2.DenseReluDense.wo.weight": "pytorch_model_00012-of-00015.bin",
98
+ "decoder.block.14.layer.2.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
99
+ "decoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model_00012-of-00015.bin",
100
+ "decoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model_00012-of-00015.bin",
101
+ "decoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model_00012-of-00015.bin",
102
+ "decoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model_00012-of-00015.bin",
103
+ "decoder.block.15.layer.0.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
104
+ "decoder.block.15.layer.1.EncDecAttention.k.weight": "pytorch_model_00012-of-00015.bin",
105
+ "decoder.block.15.layer.1.EncDecAttention.o.weight": "pytorch_model_00012-of-00015.bin",
106
+ "decoder.block.15.layer.1.EncDecAttention.q.weight": "pytorch_model_00012-of-00015.bin",
107
+ "decoder.block.15.layer.1.EncDecAttention.v.weight": "pytorch_model_00012-of-00015.bin",
108
+ "decoder.block.15.layer.1.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
109
+ "decoder.block.15.layer.2.DenseReluDense.wi.weight": "pytorch_model_00012-of-00015.bin",
110
+ "decoder.block.15.layer.2.DenseReluDense.wo.weight": "pytorch_model_00012-of-00015.bin",
111
+ "decoder.block.15.layer.2.layer_norm.weight": "pytorch_model_00012-of-00015.bin",
112
+ "decoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model_00012-of-00015.bin",
113
+ "decoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model_00013-of-00015.bin",
114
+ "decoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model_00012-of-00015.bin",
115
+ "decoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model_00012-of-00015.bin",
116
+ "decoder.block.16.layer.0.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
117
+ "decoder.block.16.layer.1.EncDecAttention.k.weight": "pytorch_model_00013-of-00015.bin",
118
+ "decoder.block.16.layer.1.EncDecAttention.o.weight": "pytorch_model_00013-of-00015.bin",
119
+ "decoder.block.16.layer.1.EncDecAttention.q.weight": "pytorch_model_00013-of-00015.bin",
120
+ "decoder.block.16.layer.1.EncDecAttention.v.weight": "pytorch_model_00013-of-00015.bin",
121
+ "decoder.block.16.layer.1.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
122
+ "decoder.block.16.layer.2.DenseReluDense.wi.weight": "pytorch_model_00013-of-00015.bin",
123
+ "decoder.block.16.layer.2.DenseReluDense.wo.weight": "pytorch_model_00013-of-00015.bin",
124
+ "decoder.block.16.layer.2.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
125
+ "decoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model_00013-of-00015.bin",
126
+ "decoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model_00013-of-00015.bin",
127
+ "decoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model_00013-of-00015.bin",
128
+ "decoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model_00013-of-00015.bin",
129
+ "decoder.block.17.layer.0.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
130
+ "decoder.block.17.layer.1.EncDecAttention.k.weight": "pytorch_model_00013-of-00015.bin",
131
+ "decoder.block.17.layer.1.EncDecAttention.o.weight": "pytorch_model_00013-of-00015.bin",
132
+ "decoder.block.17.layer.1.EncDecAttention.q.weight": "pytorch_model_00013-of-00015.bin",
133
+ "decoder.block.17.layer.1.EncDecAttention.v.weight": "pytorch_model_00013-of-00015.bin",
134
+ "decoder.block.17.layer.1.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
135
+ "decoder.block.17.layer.2.DenseReluDense.wi.weight": "pytorch_model_00013-of-00015.bin",
136
+ "decoder.block.17.layer.2.DenseReluDense.wo.weight": "pytorch_model_00013-of-00015.bin",
137
+ "decoder.block.17.layer.2.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
138
+ "decoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model_00013-of-00015.bin",
139
+ "decoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model_00013-of-00015.bin",
140
+ "decoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model_00013-of-00015.bin",
141
+ "decoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model_00013-of-00015.bin",
142
+ "decoder.block.18.layer.0.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
143
+ "decoder.block.18.layer.1.EncDecAttention.k.weight": "pytorch_model_00013-of-00015.bin",
144
+ "decoder.block.18.layer.1.EncDecAttention.o.weight": "pytorch_model_00013-of-00015.bin",
145
+ "decoder.block.18.layer.1.EncDecAttention.q.weight": "pytorch_model_00013-of-00015.bin",
146
+ "decoder.block.18.layer.1.EncDecAttention.v.weight": "pytorch_model_00013-of-00015.bin",
147
+ "decoder.block.18.layer.1.layer_norm.weight": "pytorch_model_00013-of-00015.bin",
148
+ "decoder.block.18.layer.2.DenseReluDense.wi.weight": "pytorch_model_00013-of-00015.bin",
149
+ "decoder.block.18.layer.2.DenseReluDense.wo.weight": "pytorch_model_00014-of-00015.bin",
150
+ "decoder.block.18.layer.2.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
151
+ "decoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model_00014-of-00015.bin",
152
+ "decoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model_00014-of-00015.bin",
153
+ "decoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model_00014-of-00015.bin",
154
+ "decoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model_00014-of-00015.bin",
155
+ "decoder.block.19.layer.0.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
156
+ "decoder.block.19.layer.1.EncDecAttention.k.weight": "pytorch_model_00014-of-00015.bin",
157
+ "decoder.block.19.layer.1.EncDecAttention.o.weight": "pytorch_model_00014-of-00015.bin",
158
+ "decoder.block.19.layer.1.EncDecAttention.q.weight": "pytorch_model_00014-of-00015.bin",
159
+ "decoder.block.19.layer.1.EncDecAttention.v.weight": "pytorch_model_00014-of-00015.bin",
160
+ "decoder.block.19.layer.1.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
161
+ "decoder.block.19.layer.2.DenseReluDense.wi.weight": "pytorch_model_00014-of-00015.bin",
162
+ "decoder.block.19.layer.2.DenseReluDense.wo.weight": "pytorch_model_00014-of-00015.bin",
163
+ "decoder.block.19.layer.2.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
164
+ "decoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model_00007-of-00015.bin",
165
+ "decoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model_00007-of-00015.bin",
166
+ "decoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model_00007-of-00015.bin",
167
+ "decoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model_00007-of-00015.bin",
168
+ "decoder.block.2.layer.0.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
169
+ "decoder.block.2.layer.1.EncDecAttention.k.weight": "pytorch_model_00007-of-00015.bin",
170
+ "decoder.block.2.layer.1.EncDecAttention.o.weight": "pytorch_model_00007-of-00015.bin",
171
+ "decoder.block.2.layer.1.EncDecAttention.q.weight": "pytorch_model_00007-of-00015.bin",
172
+ "decoder.block.2.layer.1.EncDecAttention.v.weight": "pytorch_model_00007-of-00015.bin",
173
+ "decoder.block.2.layer.1.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
174
+ "decoder.block.2.layer.2.DenseReluDense.wi.weight": "pytorch_model_00007-of-00015.bin",
175
+ "decoder.block.2.layer.2.DenseReluDense.wo.weight": "pytorch_model_00007-of-00015.bin",
176
+ "decoder.block.2.layer.2.layer_norm.weight": "pytorch_model_00007-of-00015.bin",
177
+ "decoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model_00014-of-00015.bin",
178
+ "decoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model_00014-of-00015.bin",
179
+ "decoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model_00014-of-00015.bin",
180
+ "decoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model_00014-of-00015.bin",
181
+ "decoder.block.20.layer.0.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
182
+ "decoder.block.20.layer.1.EncDecAttention.k.weight": "pytorch_model_00014-of-00015.bin",
183
+ "decoder.block.20.layer.1.EncDecAttention.o.weight": "pytorch_model_00014-of-00015.bin",
184
+ "decoder.block.20.layer.1.EncDecAttention.q.weight": "pytorch_model_00014-of-00015.bin",
185
+ "decoder.block.20.layer.1.EncDecAttention.v.weight": "pytorch_model_00014-of-00015.bin",
186
+ "decoder.block.20.layer.1.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
187
+ "decoder.block.20.layer.2.DenseReluDense.wi.weight": "pytorch_model_00014-of-00015.bin",
188
+ "decoder.block.20.layer.2.DenseReluDense.wo.weight": "pytorch_model_00014-of-00015.bin",
189
+ "decoder.block.20.layer.2.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
190
+ "decoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model_00014-of-00015.bin",
191
+ "decoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model_00014-of-00015.bin",
192
+ "decoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model_00014-of-00015.bin",
193
+ "decoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model_00014-of-00015.bin",
194
+ "decoder.block.21.layer.0.layer_norm.weight": "pytorch_model_00014-of-00015.bin",
195
+ "decoder.block.21.layer.1.EncDecAttention.k.weight": "pytorch_model_00015-of-00015.bin",
196
+ "decoder.block.21.layer.1.EncDecAttention.o.weight": "pytorch_model_00015-of-00015.bin",
197
+ "decoder.block.21.layer.1.EncDecAttention.q.weight": "pytorch_model_00014-of-00015.bin",
198
+ "decoder.block.21.layer.1.EncDecAttention.v.weight": "pytorch_model_00015-of-00015.bin",
199
+ "decoder.block.21.layer.1.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
200
+ "decoder.block.21.layer.2.DenseReluDense.wi.weight": "pytorch_model_00015-of-00015.bin",
201
+ "decoder.block.21.layer.2.DenseReluDense.wo.weight": "pytorch_model_00015-of-00015.bin",
202
+ "decoder.block.21.layer.2.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
203
+ "decoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model_00015-of-00015.bin",
204
+ "decoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model_00015-of-00015.bin",
205
+ "decoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model_00015-of-00015.bin",
206
+ "decoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model_00015-of-00015.bin",
207
+ "decoder.block.22.layer.0.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
208
+ "decoder.block.22.layer.1.EncDecAttention.k.weight": "pytorch_model_00015-of-00015.bin",
209
+ "decoder.block.22.layer.1.EncDecAttention.o.weight": "pytorch_model_00015-of-00015.bin",
210
+ "decoder.block.22.layer.1.EncDecAttention.q.weight": "pytorch_model_00015-of-00015.bin",
211
+ "decoder.block.22.layer.1.EncDecAttention.v.weight": "pytorch_model_00015-of-00015.bin",
212
+ "decoder.block.22.layer.1.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
213
+ "decoder.block.22.layer.2.DenseReluDense.wi.weight": "pytorch_model_00015-of-00015.bin",
214
+ "decoder.block.22.layer.2.DenseReluDense.wo.weight": "pytorch_model_00015-of-00015.bin",
215
+ "decoder.block.22.layer.2.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
216
+ "decoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model_00015-of-00015.bin",
217
+ "decoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model_00015-of-00015.bin",
218
+ "decoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model_00015-of-00015.bin",
219
+ "decoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model_00015-of-00015.bin",
220
+ "decoder.block.23.layer.0.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
221
+ "decoder.block.23.layer.1.EncDecAttention.k.weight": "pytorch_model_00015-of-00015.bin",
222
+ "decoder.block.23.layer.1.EncDecAttention.o.weight": "pytorch_model_00015-of-00015.bin",
223
+ "decoder.block.23.layer.1.EncDecAttention.q.weight": "pytorch_model_00015-of-00015.bin",
224
+ "decoder.block.23.layer.1.EncDecAttention.v.weight": "pytorch_model_00015-of-00015.bin",
225
+ "decoder.block.23.layer.1.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
226
+ "decoder.block.23.layer.2.DenseReluDense.wi.weight": "pytorch_model_00015-of-00015.bin",
227
+ "decoder.block.23.layer.2.DenseReluDense.wo.weight": "pytorch_model_00015-of-00015.bin",
228
+ "decoder.block.23.layer.2.layer_norm.weight": "pytorch_model_00015-of-00015.bin",
229
+ "decoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model_00007-of-00015.bin",
230
+ "decoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model_00008-of-00015.bin",
231
+ "decoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model_00007-of-00015.bin",
232
+ "decoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model_00008-of-00015.bin",
233
+ "decoder.block.3.layer.0.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
234
+ "decoder.block.3.layer.1.EncDecAttention.k.weight": "pytorch_model_00008-of-00015.bin",
235
+ "decoder.block.3.layer.1.EncDecAttention.o.weight": "pytorch_model_00008-of-00015.bin",
236
+ "decoder.block.3.layer.1.EncDecAttention.q.weight": "pytorch_model_00008-of-00015.bin",
237
+ "decoder.block.3.layer.1.EncDecAttention.v.weight": "pytorch_model_00008-of-00015.bin",
238
+ "decoder.block.3.layer.1.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
239
+ "decoder.block.3.layer.2.DenseReluDense.wi.weight": "pytorch_model_00008-of-00015.bin",
240
+ "decoder.block.3.layer.2.DenseReluDense.wo.weight": "pytorch_model_00008-of-00015.bin",
241
+ "decoder.block.3.layer.2.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
242
+ "decoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model_00008-of-00015.bin",
243
+ "decoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model_00008-of-00015.bin",
244
+ "decoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model_00008-of-00015.bin",
245
+ "decoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model_00008-of-00015.bin",
246
+ "decoder.block.4.layer.0.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
247
+ "decoder.block.4.layer.1.EncDecAttention.k.weight": "pytorch_model_00008-of-00015.bin",
248
+ "decoder.block.4.layer.1.EncDecAttention.o.weight": "pytorch_model_00008-of-00015.bin",
249
+ "decoder.block.4.layer.1.EncDecAttention.q.weight": "pytorch_model_00008-of-00015.bin",
250
+ "decoder.block.4.layer.1.EncDecAttention.v.weight": "pytorch_model_00008-of-00015.bin",
251
+ "decoder.block.4.layer.1.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
252
+ "decoder.block.4.layer.2.DenseReluDense.wi.weight": "pytorch_model_00008-of-00015.bin",
253
+ "decoder.block.4.layer.2.DenseReluDense.wo.weight": "pytorch_model_00008-of-00015.bin",
254
+ "decoder.block.4.layer.2.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
255
+ "decoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model_00008-of-00015.bin",
256
+ "decoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model_00008-of-00015.bin",
257
+ "decoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model_00008-of-00015.bin",
258
+ "decoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model_00008-of-00015.bin",
259
+ "decoder.block.5.layer.0.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
260
+ "decoder.block.5.layer.1.EncDecAttention.k.weight": "pytorch_model_00008-of-00015.bin",
261
+ "decoder.block.5.layer.1.EncDecAttention.o.weight": "pytorch_model_00008-of-00015.bin",
262
+ "decoder.block.5.layer.1.EncDecAttention.q.weight": "pytorch_model_00008-of-00015.bin",
263
+ "decoder.block.5.layer.1.EncDecAttention.v.weight": "pytorch_model_00008-of-00015.bin",
264
+ "decoder.block.5.layer.1.layer_norm.weight": "pytorch_model_00008-of-00015.bin",
265
+ "decoder.block.5.layer.2.DenseReluDense.wi.weight": "pytorch_model_00009-of-00015.bin",
266
+ "decoder.block.5.layer.2.DenseReluDense.wo.weight": "pytorch_model_00009-of-00015.bin",
267
+ "decoder.block.5.layer.2.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
268
+ "decoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model_00009-of-00015.bin",
269
+ "decoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model_00009-of-00015.bin",
270
+ "decoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model_00009-of-00015.bin",
271
+ "decoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model_00009-of-00015.bin",
272
+ "decoder.block.6.layer.0.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
273
+ "decoder.block.6.layer.1.EncDecAttention.k.weight": "pytorch_model_00009-of-00015.bin",
274
+ "decoder.block.6.layer.1.EncDecAttention.o.weight": "pytorch_model_00009-of-00015.bin",
275
+ "decoder.block.6.layer.1.EncDecAttention.q.weight": "pytorch_model_00009-of-00015.bin",
276
+ "decoder.block.6.layer.1.EncDecAttention.v.weight": "pytorch_model_00009-of-00015.bin",
277
+ "decoder.block.6.layer.1.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
278
+ "decoder.block.6.layer.2.DenseReluDense.wi.weight": "pytorch_model_00009-of-00015.bin",
279
+ "decoder.block.6.layer.2.DenseReluDense.wo.weight": "pytorch_model_00009-of-00015.bin",
280
+ "decoder.block.6.layer.2.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
281
+ "decoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model_00009-of-00015.bin",
282
+ "decoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model_00009-of-00015.bin",
283
+ "decoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model_00009-of-00015.bin",
284
+ "decoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model_00009-of-00015.bin",
285
+ "decoder.block.7.layer.0.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
286
+ "decoder.block.7.layer.1.EncDecAttention.k.weight": "pytorch_model_00009-of-00015.bin",
287
+ "decoder.block.7.layer.1.EncDecAttention.o.weight": "pytorch_model_00009-of-00015.bin",
288
+ "decoder.block.7.layer.1.EncDecAttention.q.weight": "pytorch_model_00009-of-00015.bin",
289
+ "decoder.block.7.layer.1.EncDecAttention.v.weight": "pytorch_model_00009-of-00015.bin",
290
+ "decoder.block.7.layer.1.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
291
+ "decoder.block.7.layer.2.DenseReluDense.wi.weight": "pytorch_model_00009-of-00015.bin",
292
+ "decoder.block.7.layer.2.DenseReluDense.wo.weight": "pytorch_model_00009-of-00015.bin",
293
+ "decoder.block.7.layer.2.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
294
+ "decoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model_00009-of-00015.bin",
295
+ "decoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model_00009-of-00015.bin",
296
+ "decoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model_00009-of-00015.bin",
297
+ "decoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model_00009-of-00015.bin",
298
+ "decoder.block.8.layer.0.layer_norm.weight": "pytorch_model_00009-of-00015.bin",
299
+ "decoder.block.8.layer.1.EncDecAttention.k.weight": "pytorch_model_00010-of-00015.bin",
300
+ "decoder.block.8.layer.1.EncDecAttention.o.weight": "pytorch_model_00010-of-00015.bin",
301
+ "decoder.block.8.layer.1.EncDecAttention.q.weight": "pytorch_model_00010-of-00015.bin",
302
+ "decoder.block.8.layer.1.EncDecAttention.v.weight": "pytorch_model_00010-of-00015.bin",
303
+ "decoder.block.8.layer.1.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
304
+ "decoder.block.8.layer.2.DenseReluDense.wi.weight": "pytorch_model_00010-of-00015.bin",
305
+ "decoder.block.8.layer.2.DenseReluDense.wo.weight": "pytorch_model_00010-of-00015.bin",
306
+ "decoder.block.8.layer.2.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
307
+ "decoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model_00010-of-00015.bin",
308
+ "decoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model_00010-of-00015.bin",
309
+ "decoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model_00010-of-00015.bin",
310
+ "decoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model_00010-of-00015.bin",
311
+ "decoder.block.9.layer.0.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
312
+ "decoder.block.9.layer.1.EncDecAttention.k.weight": "pytorch_model_00010-of-00015.bin",
313
+ "decoder.block.9.layer.1.EncDecAttention.o.weight": "pytorch_model_00010-of-00015.bin",
314
+ "decoder.block.9.layer.1.EncDecAttention.q.weight": "pytorch_model_00010-of-00015.bin",
315
+ "decoder.block.9.layer.1.EncDecAttention.v.weight": "pytorch_model_00010-of-00015.bin",
316
+ "decoder.block.9.layer.1.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
317
+ "decoder.block.9.layer.2.DenseReluDense.wi.weight": "pytorch_model_00010-of-00015.bin",
318
+ "decoder.block.9.layer.2.DenseReluDense.wo.weight": "pytorch_model_00010-of-00015.bin",
319
+ "decoder.block.9.layer.2.layer_norm.weight": "pytorch_model_00010-of-00015.bin",
320
+ "decoder.final_layer_norm.weight": "pytorch_model_00015-of-00015.bin",
321
+ "encoder.block.0.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
322
+ "encoder.block.0.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
323
+ "encoder.block.0.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
324
+ "encoder.block.0.layer.0.SelfAttention.relative_attention_bias.weight": "pytorch_model_00001-of-00015.bin",
325
+ "encoder.block.0.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
326
+ "encoder.block.0.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
327
+ "encoder.block.0.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
328
+ "encoder.block.0.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
329
+ "encoder.block.0.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
330
+ "encoder.block.1.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
331
+ "encoder.block.1.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
332
+ "encoder.block.1.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
333
+ "encoder.block.1.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
334
+ "encoder.block.1.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
335
+ "encoder.block.1.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
336
+ "encoder.block.1.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
337
+ "encoder.block.1.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
338
+ "encoder.block.10.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
339
+ "encoder.block.10.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
340
+ "encoder.block.10.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
341
+ "encoder.block.10.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
342
+ "encoder.block.10.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
343
+ "encoder.block.10.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
344
+ "encoder.block.10.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
345
+ "encoder.block.10.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
346
+ "encoder.block.11.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
347
+ "encoder.block.11.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
348
+ "encoder.block.11.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
349
+ "encoder.block.11.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
350
+ "encoder.block.11.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
351
+ "encoder.block.11.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
352
+ "encoder.block.11.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
353
+ "encoder.block.11.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
354
+ "encoder.block.12.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
355
+ "encoder.block.12.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
356
+ "encoder.block.12.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
357
+ "encoder.block.12.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
358
+ "encoder.block.12.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
359
+ "encoder.block.12.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
360
+ "encoder.block.12.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
361
+ "encoder.block.12.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
362
+ "encoder.block.13.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
363
+ "encoder.block.13.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
364
+ "encoder.block.13.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
365
+ "encoder.block.13.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
366
+ "encoder.block.13.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
367
+ "encoder.block.13.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
368
+ "encoder.block.13.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
369
+ "encoder.block.13.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
370
+ "encoder.block.14.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
371
+ "encoder.block.14.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
372
+ "encoder.block.14.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
373
+ "encoder.block.14.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
374
+ "encoder.block.14.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
375
+ "encoder.block.14.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
376
+ "encoder.block.14.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
377
+ "encoder.block.14.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
378
+ "encoder.block.15.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
379
+ "encoder.block.15.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
380
+ "encoder.block.15.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
381
+ "encoder.block.15.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
382
+ "encoder.block.15.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
383
+ "encoder.block.15.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
384
+ "encoder.block.15.layer.1.DenseReluDense.wo.weight": "pytorch_model_00004-of-00015.bin",
385
+ "encoder.block.15.layer.1.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
386
+ "encoder.block.16.layer.0.SelfAttention.k.weight": "pytorch_model_00004-of-00015.bin",
387
+ "encoder.block.16.layer.0.SelfAttention.o.weight": "pytorch_model_00004-of-00015.bin",
388
+ "encoder.block.16.layer.0.SelfAttention.q.weight": "pytorch_model_00004-of-00015.bin",
389
+ "encoder.block.16.layer.0.SelfAttention.v.weight": "pytorch_model_00004-of-00015.bin",
390
+ "encoder.block.16.layer.0.layer_norm.weight": "pytorch_model_00004-of-00015.bin",
391
+ "encoder.block.16.layer.1.DenseReluDense.wi.weight": "pytorch_model_00004-of-00015.bin",
392
+ "encoder.block.16.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
393
+ "encoder.block.16.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
394
+ "encoder.block.17.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
395
+ "encoder.block.17.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
396
+ "encoder.block.17.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
397
+ "encoder.block.17.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
398
+ "encoder.block.17.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
399
+ "encoder.block.17.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
400
+ "encoder.block.17.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
401
+ "encoder.block.17.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
402
+ "encoder.block.18.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
403
+ "encoder.block.18.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
404
+ "encoder.block.18.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
405
+ "encoder.block.18.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
406
+ "encoder.block.18.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
407
+ "encoder.block.18.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
408
+ "encoder.block.18.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
409
+ "encoder.block.18.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
410
+ "encoder.block.19.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
411
+ "encoder.block.19.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
412
+ "encoder.block.19.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
413
+ "encoder.block.19.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
414
+ "encoder.block.19.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
415
+ "encoder.block.19.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
416
+ "encoder.block.19.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
417
+ "encoder.block.19.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
418
+ "encoder.block.2.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
419
+ "encoder.block.2.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
420
+ "encoder.block.2.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
421
+ "encoder.block.2.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
422
+ "encoder.block.2.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
423
+ "encoder.block.2.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
424
+ "encoder.block.2.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
425
+ "encoder.block.2.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
426
+ "encoder.block.20.layer.0.SelfAttention.k.weight": "pytorch_model_00005-of-00015.bin",
427
+ "encoder.block.20.layer.0.SelfAttention.o.weight": "pytorch_model_00005-of-00015.bin",
428
+ "encoder.block.20.layer.0.SelfAttention.q.weight": "pytorch_model_00005-of-00015.bin",
429
+ "encoder.block.20.layer.0.SelfAttention.v.weight": "pytorch_model_00005-of-00015.bin",
430
+ "encoder.block.20.layer.0.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
431
+ "encoder.block.20.layer.1.DenseReluDense.wi.weight": "pytorch_model_00005-of-00015.bin",
432
+ "encoder.block.20.layer.1.DenseReluDense.wo.weight": "pytorch_model_00005-of-00015.bin",
433
+ "encoder.block.20.layer.1.layer_norm.weight": "pytorch_model_00005-of-00015.bin",
434
+ "encoder.block.21.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
435
+ "encoder.block.21.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
436
+ "encoder.block.21.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
437
+ "encoder.block.21.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
438
+ "encoder.block.21.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
439
+ "encoder.block.21.layer.1.DenseReluDense.wi.weight": "pytorch_model_00006-of-00015.bin",
440
+ "encoder.block.21.layer.1.DenseReluDense.wo.weight": "pytorch_model_00006-of-00015.bin",
441
+ "encoder.block.21.layer.1.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
442
+ "encoder.block.22.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
443
+ "encoder.block.22.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
444
+ "encoder.block.22.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
445
+ "encoder.block.22.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
446
+ "encoder.block.22.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
447
+ "encoder.block.22.layer.1.DenseReluDense.wi.weight": "pytorch_model_00006-of-00015.bin",
448
+ "encoder.block.22.layer.1.DenseReluDense.wo.weight": "pytorch_model_00006-of-00015.bin",
449
+ "encoder.block.22.layer.1.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
450
+ "encoder.block.23.layer.0.SelfAttention.k.weight": "pytorch_model_00006-of-00015.bin",
451
+ "encoder.block.23.layer.0.SelfAttention.o.weight": "pytorch_model_00006-of-00015.bin",
452
+ "encoder.block.23.layer.0.SelfAttention.q.weight": "pytorch_model_00006-of-00015.bin",
453
+ "encoder.block.23.layer.0.SelfAttention.v.weight": "pytorch_model_00006-of-00015.bin",
454
+ "encoder.block.23.layer.0.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
455
+ "encoder.block.23.layer.1.DenseReluDense.wi.weight": "pytorch_model_00006-of-00015.bin",
456
+ "encoder.block.23.layer.1.DenseReluDense.wo.weight": "pytorch_model_00006-of-00015.bin",
457
+ "encoder.block.23.layer.1.layer_norm.weight": "pytorch_model_00006-of-00015.bin",
458
+ "encoder.block.3.layer.0.SelfAttention.k.weight": "pytorch_model_00001-of-00015.bin",
459
+ "encoder.block.3.layer.0.SelfAttention.o.weight": "pytorch_model_00001-of-00015.bin",
460
+ "encoder.block.3.layer.0.SelfAttention.q.weight": "pytorch_model_00001-of-00015.bin",
461
+ "encoder.block.3.layer.0.SelfAttention.v.weight": "pytorch_model_00001-of-00015.bin",
462
+ "encoder.block.3.layer.0.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
463
+ "encoder.block.3.layer.1.DenseReluDense.wi.weight": "pytorch_model_00001-of-00015.bin",
464
+ "encoder.block.3.layer.1.DenseReluDense.wo.weight": "pytorch_model_00001-of-00015.bin",
465
+ "encoder.block.3.layer.1.layer_norm.weight": "pytorch_model_00001-of-00015.bin",
466
+ "encoder.block.4.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
467
+ "encoder.block.4.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
468
+ "encoder.block.4.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
469
+ "encoder.block.4.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
470
+ "encoder.block.4.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
471
+ "encoder.block.4.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
472
+ "encoder.block.4.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
473
+ "encoder.block.4.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
474
+ "encoder.block.5.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
475
+ "encoder.block.5.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
476
+ "encoder.block.5.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
477
+ "encoder.block.5.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
478
+ "encoder.block.5.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
479
+ "encoder.block.5.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
480
+ "encoder.block.5.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
481
+ "encoder.block.5.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
482
+ "encoder.block.6.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
483
+ "encoder.block.6.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
484
+ "encoder.block.6.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
485
+ "encoder.block.6.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
486
+ "encoder.block.6.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
487
+ "encoder.block.6.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
488
+ "encoder.block.6.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
489
+ "encoder.block.6.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
490
+ "encoder.block.7.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
491
+ "encoder.block.7.layer.0.SelfAttention.o.weight": "pytorch_model_00002-of-00015.bin",
492
+ "encoder.block.7.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
493
+ "encoder.block.7.layer.0.SelfAttention.v.weight": "pytorch_model_00002-of-00015.bin",
494
+ "encoder.block.7.layer.0.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
495
+ "encoder.block.7.layer.1.DenseReluDense.wi.weight": "pytorch_model_00002-of-00015.bin",
496
+ "encoder.block.7.layer.1.DenseReluDense.wo.weight": "pytorch_model_00002-of-00015.bin",
497
+ "encoder.block.7.layer.1.layer_norm.weight": "pytorch_model_00002-of-00015.bin",
498
+ "encoder.block.8.layer.0.SelfAttention.k.weight": "pytorch_model_00002-of-00015.bin",
499
+ "encoder.block.8.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
500
+ "encoder.block.8.layer.0.SelfAttention.q.weight": "pytorch_model_00002-of-00015.bin",
501
+ "encoder.block.8.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
502
+ "encoder.block.8.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
503
+ "encoder.block.8.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
504
+ "encoder.block.8.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
505
+ "encoder.block.8.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
506
+ "encoder.block.9.layer.0.SelfAttention.k.weight": "pytorch_model_00003-of-00015.bin",
507
+ "encoder.block.9.layer.0.SelfAttention.o.weight": "pytorch_model_00003-of-00015.bin",
508
+ "encoder.block.9.layer.0.SelfAttention.q.weight": "pytorch_model_00003-of-00015.bin",
509
+ "encoder.block.9.layer.0.SelfAttention.v.weight": "pytorch_model_00003-of-00015.bin",
510
+ "encoder.block.9.layer.0.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
511
+ "encoder.block.9.layer.1.DenseReluDense.wi.weight": "pytorch_model_00003-of-00015.bin",
512
+ "encoder.block.9.layer.1.DenseReluDense.wo.weight": "pytorch_model_00003-of-00015.bin",
513
+ "encoder.block.9.layer.1.layer_norm.weight": "pytorch_model_00003-of-00015.bin",
514
+ "encoder.final_layer_norm.weight": "pytorch_model_00006-of-00015.bin",
515
+ "shared.weight": "pytorch_model_00001-of-00015.bin"
516
+ }
517
+ }
pytorch_model_00001-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:933b8cd428298f520558c2f1f6572762f45a1215feaee9e0a17d4980c6c4b419
3
+ size 1676445631
pytorch_model_00002-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:6c66455cbe43c2bae0010584285eb3d39b9e2c24ed8d4ade02a992273c544d77
3
+ size 1677748159
pytorch_model_00003-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f45e56b051d63a3a321e3fbbf7134ed70e3d84cf445aa203569e1cb4f6e74b4
3
+ size 1677748159
pytorch_model_00004-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a3eeac9612fa4860a5edfee26099a00cf0172297f973698cf95a000cefd1bc3c
3
+ size 1744859071
pytorch_model_00005-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ff9cc2595345de164df497989f77220b1ef8a1e7e2c4af260bf936409e03bc22
3
+ size 1744859071
pytorch_model_00006-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9aa56ffe8ef5e5f04f90de24b5c4af2bab966623ab63b498eccb07e07d7f56ab
3
+ size 1442875327
pytorch_model_00007-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8685e462c245c4fdaad67d09be46dfdf486c594684330b1f4bc2e5e274e27c53
3
+ size 1442875327
pytorch_model_00008-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:845669bd0bbb1c6453e01a960cd823581aec71734f182637fab450637abdd7f4
3
+ size 1275094975
pytorch_model_00009-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:47dea77791ccb36e411f36e76b4b47878fe03f9ac98de67ef81a1e07d295b3d0
3
+ size 1476421567
pytorch_model_00010-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ec25daf44e8fd600a2e1a23e3e11d9d1b7f7ecf671e65e198e796612e8268520
3
+ size 1476421567
pytorch_model_00011-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:18fadba0d3ca13ef19dc343088366926b038e93a912752070f03407736c97b4d
3
+ size 1308647423
pytorch_model_00012-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f61d3ccfe0e37cf5abee04e40fb54defb15bcd1fc8d99390e37d52af909ed7e
3
+ size 1476421631
pytorch_model_00013-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0e87f0d5998d9cc309fcda325084b32975a489e403bea04278488e4746d14424
3
+ size 1375758335
pytorch_model_00014-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5f35e54a4f47e92426c745011a0e75d3913e588a37282af1e113f5a7affc835b
3
+ size 1375758335
pytorch_model_00015-of-00015.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:1201c712889c87135cdae9701284baff3b502d52edf600ef65b513b0cde1613a
3
+ size 1442869183
spiece.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d60acb128cf7b7f2536e8f38a5b18a05535c9e14c7a355904270e15b0945ea86
3
+ size 791656
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff