Update README.md
Browse files
README.md
CHANGED
@@ -231,7 +231,7 @@ Safiyyah Saleem, Holger Schwenk, and Jeff Wang.
|
|
231 |
## Training:
|
232 |
|
233 |
- The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
|
234 |
-
![EOM](https://drive.google.com/uc?id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl
|
235 |
|
236 |
## Generating with NLLB-MoE
|
237 |
The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
|
|
|
231 |
## Training:
|
232 |
|
233 |
- The Expert Output Masking is used for training, which consists in droping the full contribution for some tokens. This corresponds to the following scheme:
|
234 |
+
![EOM](https://drive.google.com/uc?export=view&id=1VNr3Ug5mQT4uFlvMDaTEyfg9rwbwGFsl)
|
235 |
|
236 |
## Generating with NLLB-MoE
|
237 |
The avalable checkpoints requires around 350GB of storage. Make sure to use `accelerate` if you do not have enough RAM on your machine.
|