File size: 1,632 Bytes
7ad9685 72cde1b 7ad9685 8cc1a6d dee55ad 8cc1a6d 272c0e8 7ad9685 313e8f9 7a8ca78 313e8f9 42f769b dee55ad a4513fe 313e8f9 1180940 313e8f9 753d9df 7a8ca78 313e8f9 7a8ca78 1180940 7a8ca78 1180940 7a8ca78 1180940 7a8ca78 313e8f9 7a8ca78 6dc4ddc 7a8ca78 6dc4ddc 7a8ca78 6dc4ddc 7a8ca78 6dc4ddc 7a8ca78 6dc4ddc 7a8ca78 6dc4ddc 7a8ca78 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
base_model: westlake-repl/SaProt_35M_AF2
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)
# Model Card for Model ID
<!-- Provide a quick summary of what the model is/does. -->
This model is trained on a sigle site deep mutation scanning dataset and
can be used to predict fitness score of mutant amino acid sequence of protein [GAL4_YEAST](https://www.uniprot.org/uniprotkb/P04386/entry) (Regulatory protein).
## Protein Function
This protein is a positive regulator for the gene expression of the galactose-induced genes such as GAL1, GAL2, GAL7, GAL10, and MEL1 which code for the enzymes used to convert galactose to glucose.
It recognizes a 17 base pair sequence in (5'-CGGRNNRCYNYNCNCCG-3') the upstream activating sequence (UAS-G) of these genes.
### Task type
protein level regression
### Dataset description
The dataset is from [Deep generative models of genetic variation capture the effects of mutations](https://www.nature.com/articles/s41592-018-0138-4).
And can also be found on [SaprotHub dataset](https://huggingface.co/datasets/SaProtHub/DMS_GAL4_YEAST).
Label means fitness score of each mutant amino acid sequence, ranging from negative infinity to positive infinity.
### Model input type
Amino acid sequence
### Performance
0.72 Spearman's ρ
### LoRA config
lora_dropout: 0.0
lora_alpha: 16
target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]
modules_to_save: ["classifier"]
### Training config
class: AdamW
betas: (0.9, 0.98)
weight_decay: 0.01
learning rate: 1e-4
epoch: 50
batch size: 2
precision: 16-mixed |