GermanT5
/

t5-efficient-gc4-german-base-nl36

Text2Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

Philip May commited on May 22, 2022

Commit

974e918

·

1 Parent(s): 46a949f

Update README.md

Files changed (1) hide show

README.md +2 -0

README.md CHANGED Viewed

@@ -18,3 +18,5 @@ This model is too big to fit on a normal 16GB GPU in FP32 mode.
 For various reasons, T5 models cannot be trained in FP16 mode.
 However, mixed precision training is not yet supported on many GPUs.
 For example, it does not work on V100 GPUs. On A100, however, it does.

 For various reasons, T5 models cannot be trained in FP16 mode.
 However, mixed precision training is not yet supported on many GPUs.
 For example, it does not work on V100 GPUs. On A100, however, it does.
+That is why we suggest to use [DeepSpeed](https://github.com/microsoft/DeepSpeed) for training.
+In particular, we recommend the [ZeRO-3 Example](https://huggingface.co/docs/transformers/main_classes/deepspeed#zero3-example) `auto` configuration.