--- license: apache-2.0 language: - en tags: - Protein_Language_Model - MSA Generation --- # MSAGPT
MSAGPT📖 Paper: MSAGPT: Neural Prompting Protein Structure Prediction via MSA Generative Pre-Training MSAGPT is a powerful protein language model (PLM). MSAGPT has 3 billion parameters with three versions of the model, MSAGPT, MSAGPT-Sft, and MSAGPT-Dpo, supporting zero-shot and few-shot MSA generation. MSAGPT achieves state-of-the-art structural prediction performance on natural MSA-scarce scenarios. |
## Visualized Cases Visualization of improved structure prediction compared with nature MSA. Yellow: Ground truth; Purple: Predictions based on MSA generated by MSAGPT; Cyan: Predictions from MSA generated by natural MSA.
## Get Started: ### Option 1:Deploy MSAGPT by yourself We support GUI for model inference. First, we need to install the dependencies. ```bash # CUDA >= 11.8 pip install -r requirements.txt ``` #### Model List You can choose to manually download the necessary weights. Then UNZIP it and put it into the **checkpoints** folder. | Model | Type | Seq Length | Download | |------------------|------|------------|-----------------------------------------------------------------------------------------------------------------------------------------| | MSAGPT | Base | 16K | [🤗 Huggingface](https://huggingface.co/THUDM/MSAGPT) [🔨 SwissArmyTransformer](https://cloud.tsinghua.edu.cn/f/ebfc954a4cd24cef9243/?dl=1) | | MSAGPT-SFT | Sft | 16K | [🤗 Huggingface](https://huggingface.co/THUDM/MSAGPT) [🔨 SwissArmyTransformer](https://cloud.tsinghua.edu.cn/f/32da3eadf6e042aab2fa/?dl=1) | | MSAGPT-DPO | Rlhf | 16K | [🤗 Huggingface](https://huggingface.co/THUDM/MSAGPT) [🔨 SwissArmyTransformer](https://cloud.tsinghua.edu.cn/f/ebfc954a4cd24cef9243/?dl=1) | | | #### Situation 1.1 CLI (SAT version) Run CLI demo via: ```bash # Online Chat bash scripts/cli_sat.sh --from_pretrained ./checkpoints/MSAGPT-DPO --input-source chat --stream_chat --max-gen-length 1024 ``` The program will automatically interact in the command line. You can generate replies entering the protein sequence you need to generate virtual MSAs (or add a few MSAs as a prompt, connected by "\
You can also enable the offline generation by set the **--input-source \