AmelieSchreiber
/

esm2_t6_8M_finetuned_cafa5_v2

Text Classification

protein language model

Inference Endpoints

Model card Files Files and versions Community

AmelieSchreiber commited on Aug 31, 2023

Commit

8301fe6

•

1 Parent(s): 4eb269f

Update README.md

Files changed (1) hide show

README.md +28 -0

README.md CHANGED Viewed

@@ -1,3 +1,31 @@
 ---
 license: mit
 ---

 ---
 license: mit
+datasets:
+- AmelieSchreiber/cafa5_pickle_split
+language:
+- en
+metrics:
+- accuracy
+- f1
+- precision
+- recall
+- roc_auc
+library_name: transformers
+tags:
+- esm
+- esm2
+- protein language model
+- biology
+- cafa5
 ---
+# ESM-2 Pre-finetuned for CAFA-5 for Protein Function Prediction
+This model is a pre-finetuned for CAFA-5 protein function prediction for four epochs.
+This model is meant to be finetuned in a second stage of training with a Low Rank Adaptation.
+The training script for both the pre-finetuning and second stage finetuning with LoRA is
+[available here](https://huggingface.co/AmelieSchreiber/esm2_t6_8M_lora_cafa5/blob/main/cafa_5_finetune_v2.ipynb).
+This notebook allows you to pre-finetune the base model, and then use a LoRA for the second stage of training.
+Note, the second stage of training is a harder curriculum for the model as it uses class weights so that the
+model better captures the hierarchical (weighted) structure of the gene ontology (GO) terms that serve as
+the labels for the multilabel sequence classification task of predicting a protein's functions (GO terms).