precious3-gpt / README.md
stefan-insilico's picture
Update README.md
febc06d verified
---
license: cc-by-nc-4.0
---
## Precious3-GPT
A multi-omics multi-species language model.
- **Developer**: [Insilico Medicine](https://insilico.com/precious)
- **License**: cc-by-nc-4.0
- **Model size**: 88.3 million parameters
- **Domain**: Biomedical
- **Base architecture**: [MPT](https://huggingface.co/mosaicml/mpt-7b)
## Quickstart
Precious-GPT can be loaded and run as standard Causal Language Model through transformers interface like this:
```python
# Load model and tokenizer
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
```
However for the convenience of using all the functionality of the Precious3-GPT model, we provide a handler.
### Run model using Prpecious3-GPT handler step by step
**Step 1 - download Prpecious3-GPT [handler.py](https://huggingface.co/insilicomedicine/precious3-gpt/blob/main/handler.py)**
```python
from handler import EndpointHandler
precious3gpt_handler = EndpointHandler()
```
**Step 2 - create input for the handler**
```python
import json
with open('./generation-configs/meta2diff.json', 'r') as f:
config_data = json.load(f)
# prepare request configuration
request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": {
"temperature": 0.8,
"top_p": 0.2,
"top_k": 3550,
"n_next_tokens": 50,
"random_seed": 137
}}
```
**How Precisou3-GPT will see given request**
```text
[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><age_individ></age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species>
```
**Step 3 - run Precisou3-GPT**
```python
output = precious3gpt_handler(request_config)
```
**Handler output structure**
```json
{
"output": {
"up": List,
"down": List
},
"mode": String, // Generation mode was selected
"message": "Done!", // or Error
"input": String // Input prompt was passed
}
```
Note: If the ```mode``` was supposed to generate compounds, the output would contain ```compounds: List```.
---
## Precious3-GPT request configuration
### Generation Modes (`mode` in config)
Choose the appropriate mode based on your requirements:
1. **meta2diff**: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc.
2. **diff2compound**: Predict compounds based on signature.
3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures.
---
### Instruction (`inputs.instruction` in config)
1. disease2diff2disease - generate signature for disease / predict disease based on given signature
2. compound2diff2compound - generate signature for compound / predict compound based on given signature
3. age_group2diff2age_group - generate signature for age group / predict age group based on signature
### Other meta-data (`inputs.` in config)
Full list of available values for each meta-data item you can find in ```p3_entities_with_type.csv```
## Examples
In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the ```inputs``` section empty string(```""```) or empty list(```[]```).
_**Example 1**_
If you want to generate a signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them.
Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768.
```json
{
"inputs": {
"instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"],
"tissue": ["lung"],
"age": "",
"cell": "",
"efo": "EFO_0000768",
"datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": []
},
"mode": "meta2diff",
"parameters": {
"temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137
}
}
```
Here is output:
```json
{
"output": {
"up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes
"down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes
},
"mode": "meta2diff", // generation mode we specified
"message": "Done!",
"input": "[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><cell></cell><efo>EFO_0000768 </efo><datatype></datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", // actual input prompt for the model
"random_seed": 137
}
```
_**Example 2**_
Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood.
Note, here we use ```disease2diff2disease``` instruction, but we expect to generate signatures for a healthy human, that's why we'd set ```efo``` to empty string "".
Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"]
```json
{
"inputs": {
"instruction": ["disease2diff2disease", "age_group2diff2age_group"],
"tissue": ["whole blood"],
"age": "",
"cell": "",
"efo": "",
"datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [],
"down": []
},
"mode": "meta2diff",
"parameters": {
"temperature": 0.8,
"top_p": 0.2,
"top_k": 3550,
"n_next_tokens": 50,
"random_seed": 137
}
}
```
Here is output:
```json
{
"output": {
"up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]],
"down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]]
},
"mode": "meta2diff",
"message": "Done!",
"input": "[BOS]<disease2diff2disease><age_group2diff2age_group><tissue>whole blood </tissue><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case>40.0-50.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>",
"random_seed": 137
}
```