Update README.md
Browse files
README.md
CHANGED
@@ -82,24 +82,6 @@ You can view other LaMini model series as follow. Note that not all models are p
|
|
82 |
</tbody>
|
83 |
</table>
|
84 |
|
85 |
-
## Training Procedure
|
86 |
-
We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
|
87 |
-
|
88 |
-
### Training Hyperparameters
|
89 |
-
|
90 |
-
The following hyperparameters were used during training:
|
91 |
-
- learning_rate: 0.0005
|
92 |
-
- train_batch_size: 128
|
93 |
-
- eval_batch_size: 64
|
94 |
-
- seed: 42
|
95 |
-
- gradient_accumulation_steps: 4
|
96 |
-
- total_train_batch_size: 512
|
97 |
-
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
98 |
-
- lr_scheduler_type: linear
|
99 |
-
- num_epochs: 5
|
100 |
-
|
101 |
-
## Evaluation
|
102 |
-
We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
|
103 |
|
104 |
## Use
|
105 |
|
@@ -122,6 +104,25 @@ generated_text = generator(input_prompt, max_length=512, do_sample=True)[0]['gen
|
|
122 |
print("Response": generated_text)
|
123 |
```
|
124 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
125 |
## Limitations
|
126 |
|
127 |
More information needed
|
|
|
82 |
</tbody>
|
83 |
</table>
|
84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
85 |
|
86 |
## Use
|
87 |
|
|
|
104 |
print("Response": generated_text)
|
105 |
```
|
106 |
|
107 |
+
## Training Procedure
|
108 |
+
We initialize with [google/flan-t5-small](https://huggingface.co/google/flan-t5-small) and fine-tune it on our [LaMini dataset](). Its total number of parameters is 61M.
|
109 |
+
|
110 |
+
### Training Hyperparameters
|
111 |
+
|
112 |
+
The following hyperparameters were used during training:
|
113 |
+
- learning_rate: 0.0005
|
114 |
+
- train_batch_size: 128
|
115 |
+
- eval_batch_size: 64
|
116 |
+
- seed: 42
|
117 |
+
- gradient_accumulation_steps: 4
|
118 |
+
- total_train_batch_size: 512
|
119 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
120 |
+
- lr_scheduler_type: linear
|
121 |
+
- num_epochs: 5
|
122 |
+
|
123 |
+
## Evaluation
|
124 |
+
We conducted two sets of evaluations: automatic evaluation on downstream NLP tasks and human evaluation on user-oriented instructions. For more detail, please refer to our [paper]().
|
125 |
+
|
126 |
## Limitations
|
127 |
|
128 |
More information needed
|