yangheng commited on
Commit
e598751
1 Parent(s): 244390e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +62 -3
README.md CHANGED
@@ -1,3 +1,62 @@
1
- ---
2
- license: mit
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ ---
4
+
5
+ # PlantRNA-FM: An Interpretable RNA Foundation Model for Exploration Functional RNA Motifs in Plants
6
+
7
+ ## Introduction
8
+ In the dynamic field of life sciences, the exploration of RNA as a fundamental element in biological processes has led to significant scientific advancements. RNA molecules, characterized by complex sequences and structures, play critical roles in plant growth, development, and adaptation to environmental changes. Recent developments in artificial intelligence, specifically foundation models (FMs), have opened new frontiers for understanding and harnessing this complexity. Building on this momentum, we introduce PlantRNA-FM, a state-of-the-art RNA foundation model tailored for plants. This model integrates both RNA sequence and structural data from an extensive compilation of plant species, enabling unprecedented accuracy in predicting RNA functions and understanding translation dynamics. By combining robust pre-training on diverse RNA data with sophisticated interpretative frameworks, PlantRNA-FM sets a new standard in RNA bioinformatics, providing deep insights into the functional significance of RNA motifs within the plant transcriptome.
9
+
10
+ ## Model Overview
11
+
12
+ ![model.png](model.png)
13
+ Figure 1. Schematic overview of the Pre-training Phase of PlantRNA-FM. The pre-training dataset comprises transcriptomic sequences from 1,124 plant species, consisting of approximately 25.0M RNA sequences and 54.2B RNA bases. The green dots on the global mean temperature map represent the geographical distribution of these plant species across the world.
14
+
15
+ ## Requirements
16
+ - Python 3.9+
17
+ - PyTorch 2.0+
18
+ - Transformers 4.38+
19
+ - pytorch-cuda 11.0+ (conda)
20
+
21
+ ## Usage
22
+ Please install the requirements and follow the instructions below to run the PlantRNA-FM model.
23
+ ### Model loading
24
+ ```python
25
+ from transformers import AutoModel, AutoTokenizer
26
+
27
+ model_name_or_path = "yangheng/PlantRNA-FM"
28
+
29
+ model = AutoModel.from_pretrained(model_name_or_path)
30
+ tokenizer = AutoTokenizer.from_pretrained(model_name_or_path)
31
+ ```
32
+
33
+ ### Inference
34
+ ```python
35
+ rna_sequence = 'GCCGACUUAGCUCAGU<mask>GGGAGAGCGUUAGACUGAAGAUCUAAAGGUCCCUGGUUCGAUCCCGGGAGUCGGCACCA'
36
+
37
+ inputs = tokenizer(rna_sequence, return_tensors="pt")
38
+ outputs = model(**inputs)
39
+ print(outputs.last_hidden_state)
40
+
41
+ ```
42
+
43
+
44
+ ## Pretraining Data
45
+ The model was pre-trained on a large-scale dataset, a.k.a., 1KP or OneKP, of RNA sequences from 1,124 plant species.
46
+ This database is open access at NCBI. Please find more details at [OneKP](https://db.cngb.org/onekp/)
47
+
48
+ ![1kp.png](1kp.png)
49
+ The taxonomy distribution of the 1KP dataset. The bar chart shows the number of species in each taxonomic group.
50
+
51
+ ## Copyright
52
+ PlantRNA-FM is licensed under the MIT License. Many thanks to all the authors of the paper for the contributions and support.
53
+ The model is co-developed by ColaLAB@UniversityofExeter and JIC@NorwichResearchPark (alphabetically ordered).
54
+
55
+ ## Citation
56
+ BioArchive Link: [PlantRNA-FM: An Interpretable RNA Foundation Model for Exploration Functional RNA Motifs in Plants](TBC)
57
+
58
+ ## Funding
59
+ This work was supported by National Key Research and Development Program of China [2023YFA0913500] (HZ); National Key Research and Development Program of China [2021YFF1000900] (HZ); National Natural Science Foundation of China [32170229] (HZ); Fundamental Research Funds for the Central Universities [2412023YQ005] (HZ); the China Scholarship Council [No.202206620047] (WS); the United Kingdom Biotechnology and Biological Sciences Research Council (BBSRC) [BB/X01102X/1] (HY, YD); European Research Council (ERC) [selected by the ERC, funded by BBSRC Horizon Europe Guarantee [EP/Y009886/1] (YD); Human Frontier Science Program Fellowship [LT001077/2021-L] (HY); UKRI Future Leaders Fellowship [MR/S017062/1, MR/X011135/1] (KL); Kan Tong Po International Fellowship [KTP\R1\231017] (KL); Amazon Research Award (KL) and National Natural Science Foundation of China [62376056, 62076056] (KL).
60
+
61
+
62
+