Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,123 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
5 |
|
6 |
-
|
7 |
-
|
8 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
tags:
|
3 |
+
- biology
|
4 |
+
- small-molecules
|
5 |
+
- single-cell-genes
|
6 |
+
- drug-discovery
|
7 |
+
- drug-target-interaction
|
8 |
+
- ibm
|
9 |
+
- mammal
|
10 |
+
- pytorch
|
11 |
+
|
12 |
+
library_name: biomed-multi-alignment
|
13 |
+
license: apache-2.0
|
14 |
+
base_model:
|
15 |
+
- ibm/biomed.omics.bl.sm.ma-ted-458m
|
16 |
---
|
17 |
|
18 |
+
Accurate prediction of drug-target binding affinity is essential in the early stages of drug discovery.
|
19 |
+
Traditionally, binding affinities are measured through high-throughput
|
20 |
+
screening experiments, which, while accurate, are resource-intensive and limited in
|
21 |
+
their scalability to evaluate large sets of drug candidates. In this model, we focus on
|
22 |
+
predicting binding affinities using pKd, the negative logarithm of the dissociation constant,
|
23 |
+
which reflects the strength of the interaction between a small molecule (drug)
|
24 |
+
and a protein (target).
|
25 |
+
|
26 |
+
The model is a fine-tuned version of IBM's biomedical foundation model, ibm/biomed.omics.bl.sm.ma-ted-458m [1],
|
27 |
+
which was trained on over 2 billion biological samples across multiple modalities, including proteins, small molecules,
|
28 |
+
and single-cell gene expression data.
|
29 |
+
|
30 |
+
The fine-tuning was performed using the PEER (Protein sEquence undERstanding) benchmark [2], which leverages data from the BindingDB dataset [3], accessible via [4].
|
31 |
+
This benchmark employs a specific test split that holds out four protein classes—estrogen receptor (ER), G-protein-coupled receptors (GPCR), ion channels,
|
32 |
+
and receptor tyrosine kinases—to evaluate the model's generalization performance on unseen classes.
|
33 |
+
|
34 |
+
The model's expected inputs are the amino acid sequence of the target protein and the SMILES (Simplified Molecular Input Line Entry System) representation of the drug.
|
35 |
+
|
36 |
+
- [1] https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m
|
37 |
+
- [2] Minghao Xu et al. “Peer: a comprehensive and multi-task benchmark for protein sequence understanding”. In: Advances in Neural Information Processing Systems 35 (2022), pp. 35156–35173.
|
38 |
+
- [3] Michael K Gilson et al. “BindingDB in 2015: A public database for medicinal chemistry, computational chemistry and systems pharmacology”. en. In: Nucleic Acids Res 44.D1 (Oct. 2015), pp. D1045–53.
|
39 |
+
- [4] https://torchdrug.ai/docs/api/datasets.html#bindingdb
|
40 |
+
|
41 |
+
|
42 |
+
## Model Summary
|
43 |
+
|
44 |
+
- **Developers:** IBM Research
|
45 |
+
- **GitHub Repository:** https://github.com/BiomedSciAI/biomed-multi-alignment
|
46 |
+
- **Paper:** https://arxiv.org/abs/2410.22367
|
47 |
+
- **Release Date**: Oct 28th, 2024
|
48 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
49 |
+
|
50 |
+
## Usage
|
51 |
+
|
52 |
+
Using `ibm/biomed.omics.bl.sm.ma-ted-458m` requires installing https://github.com/BiomedSciAI/biomed-multi-alignment
|
53 |
+
|
54 |
+
```
|
55 |
+
pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git#egg=mammal[examples]
|
56 |
+
```
|
57 |
+
|
58 |
+
A simple example for using `ibm/biomed.omics.bl.sm.ma-ted-458m.dti_bindingdb_pkd_peer`:
|
59 |
+
|
60 |
+
```python
|
61 |
+
from fuse.data.tokenizers.modular_tokenizer.op import ModularTokenizerOp
|
62 |
+
|
63 |
+
from mammal.examples.dti_bindingdb_kd.task import DtiBindingdbKdTask
|
64 |
+
from mammal.model import Mammal
|
65 |
+
|
66 |
+
# input
|
67 |
+
target_seq = "NLMKRCTRGFRKLGKCTTLEEEKCKTLYPRGQCTCSDSKMNTHSCDCKSC"
|
68 |
+
drug_seq = "CC(=O)NCCC1=CNc2c1cc(OC)cc2"
|
69 |
+
|
70 |
+
# Load Model
|
71 |
+
model = Mammal.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-458m.dti_bindingdb_pkd_peer")
|
72 |
+
model.eval()
|
73 |
+
|
74 |
+
# Load Tokenizer
|
75 |
+
tokenizer_op = ModularTokenizerOp.from_pretrained("ibm/biomed.omics.bl.sm.ma-ted-458m.dti_bindingdb_pkd_peer")
|
76 |
+
|
77 |
+
# convert to MAMMAL style
|
78 |
+
sample_dict = {"target_seq": target_seq, "drug_seq": drug_seq}
|
79 |
+
sample_dict = DtiBindingdbKdTask.data_preprocessing(
|
80 |
+
sample_dict=sample_dict,
|
81 |
+
tokenizer_op=tokenizer_op,
|
82 |
+
target_sequence_key="target_seq",
|
83 |
+
drug_sequence_key="drug_seq",
|
84 |
+
norm_y_mean=None,
|
85 |
+
norm_y_std=None,
|
86 |
+
device=model.device,
|
87 |
+
)
|
88 |
+
|
89 |
+
# forward pass - encoder_only mode which supports scalar predictions
|
90 |
+
batch_dict = model.forward_encoder_only([sample_dict])
|
91 |
+
|
92 |
+
# Post-process the model's output
|
93 |
+
batch_dict = DtiBindingdbKdTask.process_model_output(
|
94 |
+
batch_dict,
|
95 |
+
scalars_preds_processed_key="model.out.dti_bindingdb_kd",
|
96 |
+
norm_y_mean=6.286291085593906,
|
97 |
+
norm_y_std=1.5422950906208512,
|
98 |
+
)
|
99 |
+
ans = {
|
100 |
+
"model.out.dti_bindingdb_kd": float(batch_dict["model.out.dti_bindingdb_kd"][0])
|
101 |
+
}
|
102 |
+
|
103 |
+
# Print prediction
|
104 |
+
print(f"{ans=}")
|
105 |
+
```
|
106 |
+
|
107 |
+
For more advanced usage, see our detailed example at: `https://github.com/BiomedSciAI/biomed-multi-alignment`
|
108 |
+
|
109 |
+
|
110 |
+
## Citation
|
111 |
+
|
112 |
+
If you found our work useful, please consider giving a star to the repo and cite our paper:
|
113 |
+
```
|
114 |
+
@misc{shoshan2024mammalmolecularaligned,
|
115 |
+
title={MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language},
|
116 |
+
author={Yoel Shoshan and Moshiko Raboh and Michal Ozery-Flato and Vadim Ratner and Alex Golts and Jeffrey K. Weber and Ella Barkan and Simona Rabinovici-Cohen and Sagi Polaczek and Ido Amos and Ben Shapira and Liam Hazan and Matan Ninio and Sivan Ravid and Michael M. Danziger and Joseph A. Morrone and Parthasarathy Suryanarayanan and Michal Rosen-Zvi and Efrat Hexter},
|
117 |
+
year={2024},
|
118 |
+
eprint={2410.22367},
|
119 |
+
archivePrefix={arXiv},
|
120 |
+
primaryClass={q-bio.QM},
|
121 |
+
url={https://arxiv.org/abs/2410.22367},
|
122 |
+
}
|
123 |
+
```
|