evankomp commited on
Commit
ecf60fd
1 Parent(s): 2a53964

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -1
README.md CHANGED
@@ -3,4 +3,31 @@ license: mit
3
  tags:
4
  - protein
5
  - thermostability
6
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3
  tags:
4
  - protein
5
  - thermostability
6
+ ---
7
+
8
+ __Purpose__: classifies protein sequence into Thermophilic (> 60C) or Mesophilic (<40C) by host organism growth temperature.
9
+
10
+ __Training__:
11
+ ProteinBERT (Rostlab/prot_bert) was fine tuned on a class balanced version of learn2therm (see [here]()), about 250k protein amino acid sequences.
12
+
13
+ Training parameters below:
14
+ TODO
15
+
16
+ See the [training repository](https://github.com/BeckResearchLab/learn2thermML) for code.
17
+
18
+ __Usage__:
19
+ Prepare sequences identically to using the original pretrained model:
20
+
21
+ ```
22
+ from transformers import BertModelForSequenceClassification, BertTokenizer
23
+ import torch
24
+ import re
25
+ tokenizer = BertTokenizer.from_pretrained("evankomp/learn2therm", do_lower_case=False )
26
+ model = BertModelForSequenceClassification.from_pretrained("evankomp/learn2therm")
27
+ sequence_Example = "A E T C Z A O"
28
+ sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example)
29
+ encoded_input = tokenizer(sequence_Example, return_tensors='pt')
30
+ output = torch.argmax(model(**encoded_input), dim=1)
31
+ ```
32
+
33
+ 1 indicates thermophilic, 0 mesophilic.