ctheodoris commited on
Commit
bedb3b7
·
1 Parent(s): 2732369

add mtl_classifier to docs

Browse files
docs/source/about.rst CHANGED
@@ -4,11 +4,13 @@ About
4
  Model Description
5
  -----------------
6
 
7
- **Geneformer** is a context-aware, attention-based deep learning model pretrained on a large-scale corpus of ~30 million single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. With both zero-shot learning and fine-tuning with limited task-specific data, Geneformer consistently boosted predictive accuracy in a diverse panel of downstream tasks relevant to chromatin and network dynamics. In silico perturbation with zero-shot learning identified a novel transcription factor in cardiomyocytes that we experimentally validated to be critical to their ability to generate contractile force. In silico treatment with limited patient data revealed candidate therapeutic targets for cardiomyopathy that we experimentally validated to significantly improve the ability of cardiomyocytes to generate contractile force in an iPSC model of the disease. Overall, Geneformer represents a foundational deep learning model pretrained on ~30 million human single cell transcriptomes to gain a fundamental understanding of gene network dynamics that can now be democratized to a vast array of downstream tasks to accelerate discovery of key network regulators and candidate therapeutic targets.
8
 
9
- In `our manuscript <https://rdcu.be/ddrx0>`_, we report results for the 6 layer Geneformer model pretrained on Genecorpus-30M. We additionally provide within the repository a 12 layer Geneformer model, scaled up with retained width:depth aspect ratio, also pretrained on Genecorpus-30M.
10
 
11
- Both the `6 <https://huggingface.co/ctheodoris/Geneformer/blob/main/pytorch_model.bin>`_ and `12 <https://huggingface.co/ctheodoris/Geneformer/blob/main/geneformer-12L-30M/pytorch_model.bin>`_ layer Geneformer models were pretrained in June 2021.
 
 
12
 
13
  Application
14
  -----------
@@ -39,7 +41,9 @@ Example applications demonstrated in `our manuscript <https://rdcu.be/ddrx0>`_ i
39
  | - in silico perturbation to determine transcription factor targets
40
  | - in silico perturbation to determine transcription factor cooperativity
41
 
42
- Citation
43
- --------
44
 
45
  | C V Theodoris #, L Xiao, A Chopra, M D Chaffin, Z R Al Sayed, M C Hill, H Mantineo, E Brydon, Z Zeng, X S Liu, P T Ellinor #. `Transfer learning enables predictions in network biology. <https://rdcu.be/ddrx0>`_ *Nature*, 31 May 2023. (# co-corresponding authors)
 
 
 
4
  Model Description
5
  -----------------
6
 
7
+ **Geneformer** is a context-aware, attention-based deep learning model pretrained on a large-scale corpus of single-cell transcriptomes to enable context-specific predictions in settings with limited data in network biology. During pretraining, Geneformer gained a fundamental understanding of network dynamics, encoding network hierarchy in the attention weights of the model in a completely self-supervised manner. With both zero-shot learning and fine-tuning with limited task-specific data, Geneformer consistently boosted predictive accuracy in a diverse panel of downstream tasks relevant to chromatin and network dynamics. In silico perturbation with zero-shot learning identified a novel transcription factor in cardiomyocytes that we experimentally validated to be critical to their ability to generate contractile force. In silico treatment with limited patient data revealed candidate therapeutic targets for cardiomyopathy that we experimentally validated to significantly improve the ability of cardiomyocytes to generate contractile force in an iPSC model of the disease. Overall, Geneformer represents a foundational deep learning model pretrained on a large-scale corpus of human single cell transcriptomes to gain a fundamental understanding of gene network dynamics that can now be democratized to a vast array of downstream tasks to accelerate discovery of key network regulators and candidate therapeutic targets.
8
 
9
+ In `our manuscript <https://rdcu.be/ddrx0>`_, we report results for the original 6 layer Geneformer model pretrained on Genecorpus-30M. We additionally provide within the repository a 12 layer Geneformer model, scaled up with retained width:depth aspect ratio, also pretrained on Genecorpus-30M.
10
 
11
+ Both the `6 <https://huggingface.co/ctheodoris/Geneformer/blob/main/gf-6L-30M-i2048/model.safetensors>`_ and `12 <https://huggingface.co/ctheodoris/Geneformer/blob/main/gf-12L-30M-i2048/pytorch_model.bin>`_ layer Geneformer models were pretrained in June 2021.
12
+
13
+ Also see `our 2024 manuscript <https://www.biorxiv.org/content/10.1101/2024.08.16.608180v1.full.pdf>`_, for details of the `expanded model <https://huggingface.co/ctheodoris/Geneformer/blob/main/model.safetensors>`_ trained on ~95 million transcriptomes in April 2024 and our continual learning, multitask learning, and quantization strategies.
14
 
15
  Application
16
  -----------
 
41
  | - in silico perturbation to determine transcription factor targets
42
  | - in silico perturbation to determine transcription factor cooperativity
43
 
44
+ Citations
45
+ ---------
46
 
47
  | C V Theodoris #, L Xiao, A Chopra, M D Chaffin, Z R Al Sayed, M C Hill, H Mantineo, E Brydon, Z Zeng, X S Liu, P T Ellinor #. `Transfer learning enables predictions in network biology. <https://rdcu.be/ddrx0>`_ *Nature*, 31 May 2023. (# co-corresponding authors)
48
+
49
+ | H Chen \*, M S Venkatesh \*, J Gomez Ortega, S V Mahesh, T Nandi, R Madduri, K Pelka †, C V Theodoris † #. `Quantized multi-task learning for context-specific representations of gene network dynamics. <https://www.biorxiv.org/content/10.1101/2024.08.16.608180v1.full.pdf>`_ *bioRxiv*, 19 Aug 2024. (\* co-first authors, † co-senior authors, # corresponding author)
docs/source/api.rst CHANGED
@@ -17,6 +17,14 @@ Classifier
17
 
18
  geneformer.classifier
19
 
 
 
 
 
 
 
 
 
20
  Embedding Extractor
21
  -------------------
22
 
 
17
 
18
  geneformer.classifier
19
 
20
+ Multitask Classifier
21
+ ----------
22
+
23
+ .. toctree::
24
+ :maxdepth: 1
25
+
26
+ geneformer.mtl_classifier
27
+
28
  Embedding Extractor
29
  -------------------
30
 
docs/source/geneformer.mtl_classifier.rst ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ geneformer.mtl_classifier
2
+ =====================
3
+
4
+ .. automodule:: geneformer.mtl_classifier
5
+ :members:
6
+ :undoc-members:
7
+ :show-inheritance:
8
+ :exclude-members:
9
+ valid_option_dict,
10
+ validate_options,
11
+ validate_additional_options
docs/source/index.rst CHANGED
@@ -1,7 +1,7 @@
1
  Geneformer
2
  ==========
3
 
4
- Geneformer is a foundation transformer model pretrained on a large-scale corpus of ~30 million single cell transcriptomes to enable context-aware predictions in network biology.
5
 
6
  See `our manuscript <https://rdcu.be/ddrx0>`_ for details.
7
 
 
1
  Geneformer
2
  ==========
3
 
4
+ Geneformer is a foundation transformer model pretrained on a large-scale corpus of single cell transcriptomes to enable context-aware predictions in network biology.
5
 
6
  See `our manuscript <https://rdcu.be/ddrx0>`_ for details.
7