|
--- |
|
license: mit |
|
--- |
|
|
|
**Base model:** [westlake-repl/SaProt_650M_AF2](https://huggingface.co/westlake-repl/SaProt_650M_AF2) |
|
|
|
**Task type:** protein-level regression |
|
|
|
**Dataset:** This dataset contains over 100K mutants derived from the wild type EYFP protein. The number of samples for |
|
training, validation and test is 100317, 5969 and 5968. 10% of double-site mutants and 10% of triple-site mutants were used for validation and test |
|
respectively, and the remains for training. This model was trained by Jia Zheng's lab at Westlake University. The dataset will be released later by this team. |
|
|
|
**Model input type:** Amino acid sequence |
|
|
|
**Performance (on test set):** 0.94 Spearman's ρ |
|
|
|
**LoRA config:** |
|
- **r:** 8 |
|
- **lora_dropout:** 0.0 |
|
- **lora_alpha:** 16 |
|
- **target_modules:** ["query", "key", "value", "intermediate.dense", "output.dense"] |
|
- **modules_to_save:** ["classifier"] |
|
|
|
**Training config:** |
|
|
|
- **optimizer:** |
|
- **class:** AdamW |
|
- **betas:** (0.9, 0.98) |
|
- **weight_decay:** 0.01 |
|
- **learning rate:** 1e-4 |
|
- **epoch:** 20 |
|
- **batch size:** 64 |
|
- **precision:** 16-mixed |