metadata
language: protein
tags:
- protein language model
datasets:
- BFD
- Custom Rosetta
ProtBert-BFD finetuned on Rosetta 20AA dataset
This model is finetuned to predict Rosetta fold energy using a dataset of 100k 20AA sequences.
Current model in this repo: prot_bert_bfd-finetuned-032722_1752
Performance
20AA sequences (1k eval set):
Metrics: 'mae': 0.090115, 'r2': 0.991208, 'mse': 0.013034, 'rmse': 0.11416540AA sequences (10k eval set):
Metrics: 'mae': 0.537456, 'r2': 0.659122, 'mse': 0.448607, 'rmse': 0.66978160AA sequences (10k eval set):
Metrics: 'mae': 0.629267, 'r2': 0.506747, 'mse': 0.622476, 'rmse': 0.788972
prot_bert_bfd
from ProtTrans
The starting pretrained model is from ProtTrans, trained on 2.1 billion proteins from BFD. It was trained on protein sequences using a masked language modeling (MLM) objective. It was introduced in this paper and first released in this repository.
Created by Ladislav Rampasek