Philip May
commited on
Commit
·
974e918
1
Parent(s):
46a949f
Update README.md
Browse files
README.md
CHANGED
@@ -18,3 +18,5 @@ This model is too big to fit on a normal 16GB GPU in FP32 mode.
|
|
18 |
For various reasons, T5 models cannot be trained in FP16 mode.
|
19 |
However, mixed precision training is not yet supported on many GPUs.
|
20 |
For example, it does not work on V100 GPUs. On A100, however, it does.
|
|
|
|
|
|
18 |
For various reasons, T5 models cannot be trained in FP16 mode.
|
19 |
However, mixed precision training is not yet supported on many GPUs.
|
20 |
For example, it does not work on V100 GPUs. On A100, however, it does.
|
21 |
+
That is why we suggest to use [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training.
|
22 |
+
In particular, we recommend the [ZeRO-3 Example](https://huggingface.co/docs/transformers/main_classes/deepspeed#zero3-example) `auto` configuration.
|