stefan-insilico commited on
Commit
ab4fd16
·
verified ·
1 Parent(s): 4ca296d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +58 -3
README.md CHANGED
@@ -1,3 +1,58 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ ---
4
+
5
+ ---
6
+ license: cc-by-nc-4.0
7
+ ---
8
+
9
+ Step 1.
10
+
11
+ ```python
12
+ # Load model and tokenizer
13
+ from transformers import AutoTokenizer, AutoModel
14
+
15
+ tokenizer = AutoTokenizer.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
16
+ model = AutoModel.from_pretrained("insilicomedicine/precious3-gpt", trust_remote_code=True)
17
+ ```
18
+
19
+
20
+ Step 2.
21
+
22
+ ```python
23
+
24
+ # Select device
25
+ if torch.cuda.is_available():
26
+ device = f"cuda:0"
27
+ else:
28
+ device = "cpu"
29
+ print(device)
30
+ ```
31
+
32
+
33
+ Step 3.
34
+ ```python
35
+
36
+ # Load unique compounds from precious3-gpt
37
+ import pandas as pd
38
+ all_entities_with_type = pd.read_csv('p3_entities_with_type.csv')
39
+ p3_compounds = [i.strip() for i in all_entities_with_type[all_entities_with_type.type=='compound'].entity.values]
40
+ ```
41
+
42
+
43
+ Step 4.
44
+
45
+ ```python
46
+ # Example input prompt
47
+
48
+ diff2compound = """[BOS]<compound2diff2compound><tissue>liver </tissue><age></age><cell></cell><efo></efo><datatype></datatype><drug></drug><dose></dose><time></time><case></case><control></control><dataset_type>expression </dataset_type><gender>m </gender><species>human </species><up>MMP3 SLA PMS1 S100A8 AKR1C2 ADD3 INVS INSIG2 KCTD12 TAF1A MBNL2 HERC2 COG6 IFNK SLC35A1 TBL2 SGMS1 CLHC1 EDEM3 GMCL1 ST6GALNAC4 MTMR1 RPUSD3 ATG4C HOXC6 GOLPH3L RAD50 GLCE WRAP73 NBR1 GJA9 RIMS3 DNAAF2 MRPL34 TRMT61B CETN3 HMG20A GPRIN3 DHRS3 METTL21A HEATR3 MMD FOCAD RHOT1 EMG1 CDC26 FRMD4B INTS8 KLHL8 ANKRD39 NKIRAS1 LIAS FARSA PREPL ZBTB48 VAV3 OXNAD1 METTL23 GPR84 QSER1 SLC16A6 NDFIP2 TUBGCP4 HEATR5A XPO1 ORC5 SLC38A9 COG5 SLC4A7 CRLS1 MCEE LMBRD2 ZMYND19 LARS2 NR2F6 CHCHD4 ACTR6 PTPN14 CDK19 SLC25A12 GMPR2 NUDCD2 ASB3 GDE1 MRPS26 DHRS7B FUT8 PAFAH2 ECE2 POLR3K NUP88 FAM98A BAG4 SATB1 GTF2H2 FASTKD1 PIK3R4 SPICE1 MTFR1 EML4 </up><down>HCAR3 CCNA1 GCH1 MARCKS TYMS C11orf96 APOBEC3B HS3ST2 XIRP1 DGKI ATP2B1 GSG1 SERPINE2 LIMS3 TUBB2A HMGCS1 C12orf75 FCGR1A FCGR1B HEG1 ITGA4 CDC42EP3 RAB27B FKBP5 FAM72A ARNT2 ASS1 PHACTR1 KLF4 ZC3H12D IL22RA2 CCNE2 FEM1C UHRF1 THAP2 GSTO2 CCNA2 PMAIP1 CYP51A1 FOSB BCAT1 CD109 NREP SLC7A5 C4orf46 B3GNT5 CPEB2 NCR3LG1 SCD MSX1 DTL RBM3 PIK3R3 TESC SLFN5 CREB5 TMEM64 USP53 TLE3 SFPQ PHLDA2 CIRBP CACNA2D4 SLC30A1 IL32 SHCBP1 OGFRL1 MAFF ATP1B1 FAS IRF1 LY86 IL6R DHCR7 TMEM217 SPIN4 SRGAP2 GLS C3orf80 ZNF367 OTUD1 SPSB1 RNASE3 GMEB1 KBTBD8 VWF RBM12 CTSO FOXF1 ARHGAP21 HES4 ACAT2 CDCA7 PSPC1 S100P NSMAF CTNNAL1 DPYSL2 MT1G MT1E </down>"""
49
+ ```
50
+
51
+
52
+ Step 5.
53
+ ```python
54
+
55
+ generated_compounds = generate_compounds(prompt_config=diff2compound, tokenizer=tokenizer,
56
+ model=model, p3_compounds=p3_compounds, device=device, top_k=50)
57
+ ````
58
+