AmelieSchreiber commited on
Commit
52a3820
1 Parent(s): 0a52d93

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -12,7 +12,26 @@ tags:
12
  - cafa 5
13
  - protein function prediction
14
  ---
 
15
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
16
 
17
  ## Using the model
18
  First, downlowd the file `go-basic.obo` [from here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5)
 
12
  - cafa 5
13
  - protein function prediction
14
  ---
15
+ # ESM-2 for Protein Function Prediction
16
 
17
+ This is an experimental model fine-tuned from the [esm2_t6_8M_UR50D](https://huggingface.co/facebook/esm2_t6_8M_UR50D) model
18
+ for multi-label classification. In particular, the model is fine-tuned on the CAFA-5 protein sequence dataset available
19
+ [here](). More precisely, the `train_sequences.fasta` file is the list of protein sequences that were trained on, and the
20
+ `train_terms.tsv` file contains the gene ontology protein function labels for each protein sequence. For more details on using
21
+ ESM-2 models for multi-label sequence classification, [see here](https://huggingface.co/docs/transformers/model_doc/esm).
22
+
23
+ ## Fine-Tuning
24
+
25
+ The model was fine-tuned for 7 epochs at a learning rate of `5e-5`, and achieves the following metrics:
26
+ ```
27
+ Validation Loss: 0.0027,
28
+ Validation Micro F1: 0.3672,
29
+ Validation Macro F1: 0.9967,
30
+ Validation Micro Precision: 0.6052,
31
+ Validation Macro Precision: 0.9996,
32
+ Validation Micro Recall: 0.2626,
33
+ Validation Macro Recall: 0.9966
34
+ ```
35
 
36
  ## Using the model
37
  First, downlowd the file `go-basic.obo` [from here](https://huggingface.co/datasets/AmelieSchreiber/cafa_5)