aloobun commited on
Commit
5b9d4bf
·
verified ·
1 Parent(s): a89c27d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -3,6 +3,16 @@ license: apache-2.0
3
  library_name: transformers
4
  ---
5
 
 
 
 
 
 
 
 
 
 
 
6
  ```
7
  <|user|>
8
  Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center?
 
3
  library_name: transformers
4
  ---
5
 
6
+ i wanted to learn more about exposure bias mitigation in language models and came across ReMask [https://huggingface.co/euclaise/ReMask-3B].
7
+ it's a neat idea, and i wanted to give it a go.
8
+
9
+ - during training, the model processes input sequences twice - once with the full sequence & once with masked sequence.
10
+ - computes model outputs for both.
11
+ - divergence loss is computed as the average of forward and backward KL divergences.
12
+ - final loss is a weighted sum of the cross entropy losses and the divergence loss.
13
+
14
+ impl on github
15
+
16
  ```
17
  <|user|>
18
  Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center?