Update README.md
Browse files
README.md
CHANGED
@@ -1,8 +1,78 @@
|
|
1 |
---
|
2 |
tags:
|
3 |
-
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
4 |
---
|
|
|
|
|
|
|
|
|
5 |
|
6 |
-
|
7 |
-
|
8 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
tags:
|
3 |
+
- drug-discovery
|
4 |
+
- ibm
|
5 |
+
- mammal
|
6 |
+
- pytorch
|
7 |
+
- small molecules drugs
|
8 |
+
- smiles
|
9 |
+
- MoleculeNet
|
10 |
+
- FDA approval
|
11 |
+
- safetensors
|
12 |
+
- biomed-multi-alignment
|
13 |
+
license: apache-2.0
|
14 |
+
library_name: biomed-multi-alignment
|
15 |
+
base_model:
|
16 |
+
- ibm/biomed.omics.bl.sm.ma-ted-458m
|
17 |
---
|
18 |
+
Drugs must satisfy stringent criteria for both efficacy and safety. This model predicts the likelihood of FDA approval
|
19 |
+
for small-molecule drugs, represented using SMILES (Simplified Molecular Input Line Entry System) strings. It is a fine-tuned version of the
|
20 |
+
IBM biomedical foundation model, ibm/biomed.omics.bl.sm.ma-ted-458m [1], trained on over 2 billion biological samples spanning multiple modalities,
|
21 |
+
including proteins, small molecules, and single-cell gene expression data.
|
22 |
|
23 |
+
The fine-tuning was performed using the MoleculeNet Clintox dataset [2]. For benchmarking, we employed predefined training, validation, and testing splits
|
24 |
+
provided by MolFormer [3], sourced from the dataset referenced in [4].
|
25 |
+
|
26 |
+
- [1] https://huggingface.co/ibm/biomed.omics.bl.sm.ma-ted-458m
|
27 |
+
- [2] Zhenqin Wu et al. “MoleculeNet: a benchmark for molecular machine learning”.
|
28 |
+
In: Chemical science 9.2 (2018), pp. 513–530.
|
29 |
+
- [3] Jerret Ross et al. “Large-scale chemical language representations capture molecular
|
30 |
+
structure and properties”. In: Nature Machine Intelligence 4.12 (2022),
|
31 |
+
pp. 1256–1264.
|
32 |
+
- [4] https://github.com/IBM/molformer/tree/main/data that points to https://ibm.ent.box.com/v/MoLFormer-data (file: finetune datasets.zip).
|
33 |
+
|
34 |
+
## Model Summary
|
35 |
+
|
36 |
+
- **Developers:** IBM Research
|
37 |
+
- **GitHub Repository:** https://github.com/BiomedSciAI/biomed-multi-alignment
|
38 |
+
- **Paper:** https://arxiv.org/abs/2410.22367
|
39 |
+
- **Release Date**: Dec 4th, 2024
|
40 |
+
- **License:** [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0).
|
41 |
+
|
42 |
+
## Usage
|
43 |
+
|
44 |
+
Using `biomed.omics.bl.sm.ma-ted-458m.moleculenet_clintox_fda` requires installing https://github.com/BiomedSciAI/biomed-multi-alignment
|
45 |
+
|
46 |
+
```
|
47 |
+
pip install git+https://github.com/BiomedSciAI/biomed-multi-alignment.git
|
48 |
+
```
|
49 |
+
|
50 |
+
A simple example for using `ibm/omics.bl.sm.ma-ted-458m.moleculenet_clintox_fda`:
|
51 |
+
|
52 |
+
```
|
53 |
+
from mammal.examples.molnet.molnet_infer import load_model, task_infer
|
54 |
+
|
55 |
+
smiles_seq = "C(Cl)Cl"
|
56 |
+
|
57 |
+
task_dict = load_model(task_name="FDA_APPR", device="cpu")
|
58 |
+
result = task_infer(task_dict=task_dict, smiles_seq=smiles_seq)
|
59 |
+
print(f"The prediction for {smiles_seq=} is {result}")
|
60 |
+
```
|
61 |
+
|
62 |
+
See our detailed example at: on `https://github.com/BiomedSciAI/biomed-multi-alignment`
|
63 |
+
|
64 |
+
|
65 |
+
## Citation
|
66 |
+
|
67 |
+
If you found our work useful, please consider giving a star to the repo and cite our paper:
|
68 |
+
```
|
69 |
+
@misc{shoshan2024mammalmolecularaligned,
|
70 |
+
title={MAMMAL -- Molecular Aligned Multi-Modal Architecture and Language},
|
71 |
+
author={Yoel Shoshan and Moshiko Raboh and Michal Ozery-Flato and Vadim Ratner and Alex Golts and Jeffrey K. Weber and Ella Barkan and Simona Rabinovici-Cohen and Sagi Polaczek and Ido Amos and Ben Shapira and Liam Hazan and Matan Ninio and Sivan Ravid and Michael M. Danziger and Joseph A. Morrone and Parthasarathy Suryanarayanan and Michal Rosen-Zvi and Efrat Hexter},
|
72 |
+
year={2024},
|
73 |
+
eprint={2410.22367},
|
74 |
+
archivePrefix={arXiv},
|
75 |
+
primaryClass={q-bio.QM},
|
76 |
+
url={https://arxiv.org/abs/2410.22367},
|
77 |
+
}
|
78 |
+
```
|