|
--- |
|
license: mit |
|
tags: |
|
- protein |
|
- thermostability |
|
--- |
|
|
|
__Purpose__: classifies protein sequence into Thermophilic (> 60C) or Mesophilic (<40C) by host organism growth temperature. |
|
|
|
__Training__: |
|
ProteinBERT (Rostlab/prot_bert) was fine tuned on a class balanced version of learn2therm (see [here]()), about 250k protein amino acid sequences. |
|
|
|
Training parameters below: |
|
TODO |
|
|
|
See the [training repository](https://github.com/BeckResearchLab/learn2thermML) for code. |
|
|
|
__Usage__: |
|
Prepare sequences identically to using the original pretrained model: |
|
|
|
``` |
|
from transformers import BertModelForSequenceClassification, BertTokenizer |
|
import torch |
|
import re |
|
tokenizer = BertTokenizer.from_pretrained("evankomp/learn2therm", do_lower_case=False ) |
|
model = BertModelForSequenceClassification.from_pretrained("evankomp/learn2therm") |
|
sequence_Example = "A E T C Z A O" |
|
sequence_Example = re.sub(r"[UZOB]", "X", sequence_Example) |
|
encoded_input = tokenizer(sequence_Example, return_tensors='pt') |
|
output = torch.argmax(model(**encoded_input), dim=1) |
|
``` |
|
|
|
1 indicates thermophilic, 0 mesophilic. |