SyedA5688 commited on
Commit
6758dd7
1 Parent(s): 1d548b3

Updated ReadME with model card information

Browse files
Files changed (1) hide show
  1. README.md +24 -3
README.md CHANGED
@@ -1,3 +1,24 @@
1
- ---
2
- license: cc-by-nc-nd-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Overview
2
+ This is the C2S-Pythia-410m-cell-type-prediction model, based on the Pythia-410m architecture developed by EleutherAI,
3
+ fine-tuned using Cell2Sentence (C2S) on a diverse set of single-cell RNA sequencing (scRNA-seq) datasets from CellxGene
4
+ and the Human Cell Atlas. Cell2Sentence is an innovative approach for adapting large language models (LLMs) to
5
+ single-cell biology by transforming scRNA-seq data into "cell sentences"—sequences of gene names ordered by
6
+ expression levels. This transformation enables LLMs to leverage their natural language processing capabilities for
7
+ various single-cell tasks, with a focus on cell type prediction in this model.
8
+
9
+ # Training Data
10
+ This model was trained on over 57 million human and mouse cells gathered from over 800 single-cell RNA sequencing
11
+ datasets from CellxGene and the Human Cell Atlas. This dataset covers a broad range of cell types and conditions
12
+ from multiple tissues in both human and mouse.
13
+
14
+ # Tasks
15
+ This model is designed for:
16
+ - Cell type prediction: Predicting the cell type based on the "cell sentence" generated from scRNA-seq data.
17
+
18
+ # Cell2Sentence Links
19
+ - GitHub: https://github.com/vandijklab/cell2sentence
20
+ - Paper: https://www.biorxiv.org/content/10.1101/2023.09.11.557287v3
21
+
22
+ # Pythia Links
23
+ - Paper: https://arxiv.org/pdf/2304.01373
24
+ - Hugging Face: https://huggingface.co/EleutherAI/pythia-410m