Rostlab
/

prot_t5_xl_half_uniref50-enc

protein language model

text-generation-inference

Inference Endpoints

Model card Files Files and versions

t03i commited on May 23, 2022

Commit

9b36cbf

•

1 Parent(s): f94c9e6

Fix spelling mistakes

Files changed (1) hide show

README.md +4 -4

README.md CHANGED Viewed

@@ -19,17 +19,17 @@ ProtT5-XL-UniRef50 is based on the `t5-3b` model and was pretrained on a large c
 This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
 publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
-One important difference between this T5 model and the original T5 version is the denosing objective.
-The original T5-3B model was pretrained using a span denosing objective, while this model was pre-trained with a Bart-like MLM denosing objective.
 The masking probability is consistent with the original T5 training by randomly masking 15% of the amino acids in the input.
 This model only contains the encoder portion of the original ProtT5-XL-UniRef50 model using half precision (float16).
-As such this model can efficiently be used to create protein/ amino acid representations. When used for training downstream networks/ feature extraction, these embeddings produce almost the same performance (established emperically by comparing on several downstream tasks).
 ## Intended uses & limitations
-This version of the original ProtT5-XL-UniRef50 is mostly meant for conveniently creating amino-acid or protein embeddings with a low GPU-memory footprint and reasonable embedding-quality. This model is fully usable on 8GB of video RAM.
 ### How to use

 This means it was pretrained on the raw protein sequences only, with no humans labelling them in any way (which is why it can use lots of
 publicly available data) with an automatic process to generate inputs and labels from those protein sequences.
+One important difference between this T5 model and the original T5 version is the denoising objective.
+The original T5-3B model was pretrained using a span denoising objective, while this model was pretrained with a Bart-like MLM denoising objective.
 The masking probability is consistent with the original T5 training by randomly masking 15% of the amino acids in the input.
 This model only contains the encoder portion of the original ProtT5-XL-UniRef50 model using half precision (float16).
+As such, this model can efficiently be used to create protein/ amino acid representations. When used for training downstream networks/ feature extraction, these embeddings produced the same performance (established empirically by comparing on several downstream tasks).
 ## Intended uses & limitations
+This version of the original ProtT5-XL-UniRef50 is mostly meant for conveniently creating amino-acid or protein embeddings with a low GPU-memory footprint without any measurable performance-decrease in our experiments. This model is fully usable on 8 GB of video RAM.
 ### How to use