metadata
license: mit
Base model: westlake-repl/SaProt_650M_AF2
Task type: protein-level regression
Dataset: This dataset contains single-site and double-site mutants derived from the wild type EYFP protein. The number of samples for training, validation and test is 26168, 3087 and 3088. All single-site mutants and 80% of double-site mutants for training, 10% of double-site mutants for validation and test respectively. This model was trained by Jia Zheng's lab at Westlake University. The dataset will be released later by this team.
Model input type: Amino acid sequence
Performance (on test set): 0.95 Spearman's ρ
LoRA config:
- r: 8
- lora_dropout: 0.0
- lora_alpha: 16
- target_modules: ["query", "key", "value", "intermediate.dense", "output.dense"]
- modules_to_save: ["classifier"]
Training config:
- optimizer:
- class: AdamW
- betas: (0.9, 0.98)
- weight_decay: 0.01
- learning rate: 1e-4
- epoch: 50
- batch size: 64
- precision: 16-mixed