|
--- |
|
license: cc-by-nc-4.0 |
|
--- |
|
|
|
## Precious3-GPT |
|
|
|
A multi-omics multi-species language model. |
|
|
|
- **Developer**: [Insilico Medicine](https://insilico.com/precious) |
|
- **License**: cc-by-nc-4.0 |
|
- **Model size**: 88.3 million parameters |
|
- **Domain**: Biomedical |
|
- **Base architecture**: [MPT](https://huggingface.co/mosaicml/mpt-7b) |
|
|
|
## Quickstart |
|
Precious-GPT can be loaded and run as standard Causal Language Model through transformers interface like this: |
|
|
|
```python |
|
# Load model and tokenizer |
|
from transformers import AutoTokenizer, AutoModel |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True) |
|
model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True) |
|
``` |
|
|
|
However for the convenience of using all the functionality of the Precious3-GPT model, we provide a handler. |
|
|
|
### Run model using Prpecious3-GPT handler step by step |
|
|
|
|
|
**Step 1 - download Prpecious3-GPT [handler.py](https://huggingface.co/insilicomedicine/precious3-gpt/blob/main/handler.py)** |
|
```python |
|
from handler import EndpointHandler |
|
precious3gpt_handler = EndpointHandler() |
|
``` |
|
|
|
**Step 2 - create input for the handler** |
|
|
|
```python |
|
import json |
|
with open('./generation-configs/meta2diff.json', 'r') as f: |
|
config_data = json.load(f) |
|
|
|
# prepare request configuration |
|
request_config = {"inputs": config_data, "mode": "meta2diff", "parameters": { |
|
"temperature": 0.8, |
|
"top_p": 0.2, |
|
"top_k": 3550, |
|
"n_next_tokens": 50, |
|
"random_seed": 137 |
|
}} |
|
|
|
``` |
|
|
|
**How Precisou3-GPT will see given request** |
|
```text |
|
[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><age_individ></age_individ><cell></cell><efo>EFO_0000768 </efo><datatype>expression </datatype><drug>curcumin </drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type></dataset_type><gender>m </gender><species>human </species> |
|
``` |
|
|
|
**Step 3 - run Precisou3-GPT** |
|
```python |
|
output = precious3gpt_handler(request_config) |
|
``` |
|
|
|
|
|
**Handler output structure** |
|
```json |
|
{ |
|
"output": { |
|
"up": List, |
|
"down": List |
|
}, |
|
"mode": String, // Generation mode was selected |
|
"message": "Done!", // or Error |
|
"input": String // Input prompt was passed |
|
|
|
} |
|
``` |
|
Note: If the ```mode``` was supposed to generate compounds, the output would contain ```compounds: List```. |
|
|
|
--- |
|
## Precious3-GPT request configuration |
|
|
|
### Generation Modes (`mode` in config) |
|
|
|
Choose the appropriate mode based on your requirements: |
|
|
|
1. **meta2diff**: Generate signature (up- and down- gene lists) given meta-data such as tissue, compound, gender, etc. |
|
2. **diff2compound**: Predict compounds based on signature. |
|
3. **meta2diff2compound**: Generate signatures given meta-data and then predict compounds based on generated signatures. |
|
|
|
--- |
|
|
|
|
|
### Instruction (`inputs.instruction` in config) |
|
|
|
1. disease2diff2disease - generate signature for disease / predict disease based on given signature |
|
2. compound2diff2compound - generate signature for compound / predict compound based on given signature |
|
3. age_group2diff2age_group - generate signature for age group / predict age group based on signature |
|
|
|
|
|
### Other meta-data (`inputs.` in config) |
|
|
|
Full list of available values for each meta-data item you can find in ```p3_entities_with_type.csv``` |
|
|
|
|
|
|
|
## Examples |
|
|
|
In the following examples all possible configuration fields are specified. You can leave some meta-data fields in the ```inputs``` section empty string(```""```) or empty list(```[]```). |
|
|
|
_**Example 1**_ |
|
|
|
If you want to generate a signature given specific meta-data you can use the following configuration. Note, ```up``` and ```down``` fields are empty lists as you want to generate them. |
|
Here we ask the model to generate a signature for a human within the age group of 70-90 years, male, in tissue - Lungs with disease EFO_0000768. |
|
|
|
```json |
|
{ |
|
"inputs": { |
|
"instruction": ["age_group2diff2age_group", "disease2diff2disease", "compound2diff2compound"], |
|
"tissue": ["lung"], |
|
"age": "", |
|
"cell": "", |
|
"efo": "EFO_0000768", |
|
"datatype": "", "drug": "", "dose": "", "time": "", "case": ["70.0-80.0", "80.0-90.0"], "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], "down": [] |
|
}, |
|
"mode": "meta2diff", |
|
"parameters": { |
|
"temperature": 0.8, "top_p": 0.2, "top_k": 3550, "n_next_tokens": 50, "random_seed": 137 |
|
} |
|
} |
|
``` |
|
|
|
Here is output: |
|
```json |
|
{ |
|
"output": { |
|
"up": [["PTGDR2", "CABYR", "MGAM", "TMED9", "SHOX2", "MAT1A", "MUC5AC", "GASK1B", "CYP1A2", "RP11-266K4.9", ...]], // generated list of up-regulated genes |
|
"down": [["MB", "OR10V1", "OR51H1", "GOLGA6L10", "OR6M1", "CDX4", "OR4C45", "SPRR2A", "SPDYE9", "GBX2", "ATP4B", ...]] // generated list of down-regulated genes |
|
}, |
|
"mode": "meta2diff", // generation mode we specified |
|
"message": "Done!", |
|
"input": "[BOS]<age_group2diff2age_group><disease2diff2disease><compound2diff2compound><tissue>lung </tissue><cell></cell><efo>EFO_0000768 </efo><datatype></datatype><drug></drug><dose></dose><time></time><case>70.0-80.0 80.0-90.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", // actual input prompt for the model |
|
"random_seed": 137 |
|
} |
|
``` |
|
|
|
|
|
_**Example 2**_ |
|
|
|
Now let's generate a signature for a healthy human within the age group of 70-90 years, male, in tissue - whole blood. |
|
Note, here we use ```disease2diff2disease``` instruction, but we expect to generate signatures for a healthy human, that's why we'd set ```efo``` to empty string "". |
|
Alternatively, for this example we can add one more instruction to example 2 - "instruction": ["disease2diff2disease", "age_group2diff2age_group"] |
|
|
|
```json |
|
{ |
|
"inputs": { |
|
"instruction": ["disease2diff2disease", "age_group2diff2age_group"], |
|
"tissue": ["whole blood"], |
|
"age": "", |
|
"cell": "", |
|
"efo": "", |
|
"datatype": "", "drug": "", "dose": "", "time": "", "case": "40.0-50.0", "control": "", "dataset_type": "expression", "gender": "m", "species": "human", "up": [], |
|
"down": [] |
|
}, |
|
"mode": "meta2diff", |
|
"parameters": { |
|
"temperature": 0.8, |
|
"top_p": 0.2, |
|
"top_k": 3550, |
|
"n_next_tokens": 50, |
|
"random_seed": 137 |
|
} |
|
} |
|
|
|
``` |
|
|
|
Here is output: |
|
```json |
|
{ |
|
"output": { |
|
"up": [["IER3", "APOC2", "EDNRB", "JAKMIP2", "BACE2", ... ]], |
|
"down": [["TBL1Y", "TDP1", "PLPP4", "CPEB1", "ITPR3", ... ]] |
|
}, |
|
"mode": "meta2diff", |
|
"message": "Done!", |
|
"input": "[BOS]<disease2diff2disease><age_group2diff2age_group><tissue>whole blood </tissue><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case>40.0-50.0 </case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species>", |
|
"random_seed": 137 |
|
} |
|
``` |
|
|
|
|
|
|