hongyiwang commited on
Commit
536bde8
·
verified ·
1 Parent(s): 3fd6251

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +6 -2
README.md CHANGED
@@ -2,6 +2,7 @@
2
  DNA FM 7B is DNA foundation model trained on 10.6 billion nucleotides from 796 species, enabling genome mining, in silico mutagenesis studies, gene expression prediction, and directed sequence generation.
3
 
4
  By scaling model depth while maintaining a short context length of 4000 nucleotides, DNA FM shows substantial improvements across a breadth of tasks in functional genomics using transfer learning, sequence generation, and unsupervised annotation of functional elements. Notably, DNA FM outperforms prior encoder-only architectures without new data, suggesting that new scaling laws are needed to achieve compute-optimal DNA language models.
 
5
 
6
  ## Model Architectural Details
7
  DNA FM 7B is based on the bidirectional transformer encoder (BERT) architecture with single-nucleotide tokenization, and is optimized using a masked language modeling (MLM) training objective.
@@ -38,7 +39,7 @@ To minimize bias and learn high-resolution single-nucleotide dependencies, we op
38
  We evaluate the benefits of pretraining DNA FM 7B by conducting a comprehensive series of experiments related to functional genomics, genome mining, metabolic engineering, synthetic biology, and therapeutics design, covering supervised, unsupervised, and generative objectives. Unless otherwise stated, hyperparameters were determined by optimizing model performance on a 10% validation split of the training data, and models were tested using the checkpoint with the lowest validation loss. For more detailed information, please refer to [our paper](https://openreview.net/forum?id=Kis8tVUeNi&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DNeurIPS.cc%2F2024%2FWorkshop%2FAIDrugX%2FAuthors%23your-submissions)).
39
 
40
  ## Results
41
- TODO (@Caleb), we will need to see what results we want to put here.
42
 
43
  ## How to Use
44
  ### Build any downstream models from this backbone
@@ -96,4 +97,7 @@ author={Caleb Ellington, Ning Sun, Nicholas Ho, Tianhua Tao, Sazan Mahbub, Yongh
96
  booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
97
  year={2024}
98
  }
99
- ```
 
 
 
 
2
  DNA FM 7B is DNA foundation model trained on 10.6 billion nucleotides from 796 species, enabling genome mining, in silico mutagenesis studies, gene expression prediction, and directed sequence generation.
3
 
4
  By scaling model depth while maintaining a short context length of 4000 nucleotides, DNA FM shows substantial improvements across a breadth of tasks in functional genomics using transfer learning, sequence generation, and unsupervised annotation of functional elements. Notably, DNA FM outperforms prior encoder-only architectures without new data, suggesting that new scaling laws are needed to achieve compute-optimal DNA language models.
5
+ <center><img src="DNA_RNA FM model architecture.png" alt="An Overview of DNA FM 7B" style="width:60%; height:auto;" /></center>
6
 
7
  ## Model Architectural Details
8
  DNA FM 7B is based on the bidirectional transformer encoder (BERT) architecture with single-nucleotide tokenization, and is optimized using a masked language modeling (MLM) training objective.
 
39
  We evaluate the benefits of pretraining DNA FM 7B by conducting a comprehensive series of experiments related to functional genomics, genome mining, metabolic engineering, synthetic biology, and therapeutics design, covering supervised, unsupervised, and generative objectives. Unless otherwise stated, hyperparameters were determined by optimizing model performance on a 10% validation split of the training data, and models were tested using the checkpoint with the lowest validation loss. For more detailed information, please refer to [our paper](https://openreview.net/forum?id=Kis8tVUeNi&referrer=%5BAuthor%20Console%5D(%2Fgroup%3Fid%3DNeurIPS.cc%2F2024%2FWorkshop%2FAIDrugX%2FAuthors%23your-submissions)).
40
 
41
  ## Results
42
+ <center><img src="circle_benchmarks.png" alt="Downstream results of DNA FM 7B" style="width:60%; height:auto;" /></center>
43
 
44
  ## How to Use
45
  ### Build any downstream models from this backbone
 
97
  booktitle={NeurIPS 2024 Workshop on AI for New Drug Modalities},
98
  year={2024}
99
  }
100
+ ```
101
+
102
+ ## License
103
+ @Hongyi TODO