aloobun
/

ReMask-135m

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

aloobun commited on Dec 13, 2024

Commit

5b9d4bf

·

verified ·

1 Parent(s): a89c27d

Update README.md

Files changed (1) hide show

README.md +10 -0

README.md CHANGED Viewed

@@ -3,6 +3,16 @@ license: apache-2.0
 library_name: transformers
 ---
 ```
 <|user|>
 Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center?

 library_name: transformers
 ---
+i wanted to learn more about exposure bias mitigation in language models and came across ReMask [https://huggingface.co/euclaise/ReMask-3B].
+it's a neat idea, and i wanted to give it a go.
+- during training, the model processes input sequences twice - once with the full sequence & once with masked sequence.
+- computes model outputs for both.
+- divergence loss is computed as the average of forward and backward KL divergences.
+- final loss is a weighted sum of the cross entropy losses and the divergence loss.
+impl on github
 ```
 <|user|>
 Could Moulin Rouge have been hypothetically used as Spain's Spanish American War triage center?