Update README.md
Browse files
README.md
CHANGED
@@ -3,6 +3,16 @@ license: apache-2.0
|
|
3 |
library_name: transformers
|
4 |
---
|
5 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
6 |
```
|
7 |
<|user|>
|
8 |
Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center?
|
|
|
3 |
library_name: transformers
|
4 |
---
|
5 |
|
6 |
+
i wanted to learn more about exposure bias mitigation in language models and came across ReMask [https://huggingface.co/euclaise/ReMask-3B].
|
7 |
+
it's a neat idea, and i wanted to give it a go.
|
8 |
+
|
9 |
+
- during training, the model processes input sequences twice - once with the full sequence & once with masked sequence.
|
10 |
+
- computes model outputs for both.
|
11 |
+
- divergence loss is computed as the average of forward and backward KL divergences.
|
12 |
+
- final loss is a weighted sum of the cross entropy losses and the divergence loss.
|
13 |
+
|
14 |
+
impl on github
|
15 |
+
|
16 |
```
|
17 |
<|user|>
|
18 |
Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center?
|