File size: 1,632 Bytes

7ad9685
 
 
72cde1b
7ad9685
 
 
 
8cc1a6d
dee55ad
8cc1a6d
 
272c0e8
 
7ad9685
313e8f9
7a8ca78
313e8f9
42f769b
dee55ad
 
a4513fe
313e8f9
1180940
313e8f9
753d9df
7a8ca78
313e8f9
7a8ca78
1180940
7a8ca78
1180940
7a8ca78
1180940
7a8ca78
313e8f9
 
7a8ca78
6dc4ddc
7a8ca78
6dc4ddc
7a8ca78
6dc4ddc
7a8ca78
6dc4ddc
7a8ca78
6dc4ddc
7a8ca78
6dc4ddc
7a8ca78

---
base_model: westlake-repl/SaProt_35M_AF2
---
# Base model: [westlake-repl/SaProt_35M_AF2](https://huggingface.co/westlake-repl/SaProt_35M_AF2)

# Model Card for Model ID

<!-- Provide a quick summary of what the model is/does. -->
This model is trained on a sigle site deep mutation scanning dataset and
can be used to predict fitness score of mutant amino acid sequence of protein [GAL4_YEAST](https://www.uniprot.org/uniprotkb/P04386/entry) (Regulatory protein). 

## Protein Function
This protein is a positive regulator for the gene expression of the galactose-induced genes such as GAL1, GAL2, GAL7, GAL10, and MEL1 which code for the enzymes used to convert galactose to glucose. 
It recognizes a 17 base pair sequence in (5'-CGGRNNRCYNYNCNCCG-3') the upstream activating sequence (UAS-G) of these genes.

### Task type
protein level regression
### Dataset description
The dataset is from [Deep generative models of genetic variation capture the effects of mutations](https://www.nature.com/articles/s41592-018-0138-4).
And can also be found on [SaprotHub dataset](https://huggingface.co/datasets/SaProtHub/DMS_GAL4_YEAST).

Label means fitness score of each mutant amino acid sequence, ranging from negative infinity to positive infinity.
### Model input type
Amino acid sequence
### Performance
0.72 Spearman's ρ

### LoRA config
lora_dropout: 0.0

lora_alpha: 16

target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]

modules_to_save: ["classifier"]

### Training config
class: AdamW

betas: (0.9, 0.98)

weight_decay: 0.01

learning rate: 1e-4

epoch: 50

batch size: 2

precision: 16-mixed