Updated ReadME with model card information
Browse files
README.md
CHANGED
@@ -1,3 +1,24 @@
|
|
1 |
-
|
2 |
-
|
3 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Overview
|
2 |
+
This is the C2S-Pythia-410m-cell-type-prediction model, based on the Pythia-410m architecture developed by EleutherAI,
|
3 |
+
fine-tuned using Cell2Sentence (C2S) on a diverse set of single-cell RNA sequencing (scRNA-seq) datasets from CellxGene
|
4 |
+
and the Human Cell Atlas. Cell2Sentence is an innovative approach for adapting large language models (LLMs) to
|
5 |
+
single-cell biology by transforming scRNA-seq data into "cell sentences"—sequences of gene names ordered by
|
6 |
+
expression levels. This transformation enables LLMs to leverage their natural language processing capabilities for
|
7 |
+
various single-cell tasks, with a focus on cell type prediction in this model.
|
8 |
+
|
9 |
+
# Training Data
|
10 |
+
This model was trained on over 57 million human and mouse cells gathered from over 800 single-cell RNA sequencing
|
11 |
+
datasets from CellxGene and the Human Cell Atlas. This dataset covers a broad range of cell types and conditions
|
12 |
+
from multiple tissues in both human and mouse.
|
13 |
+
|
14 |
+
# Tasks
|
15 |
+
This model is designed for:
|
16 |
+
- Cell type prediction: Predicting the cell type based on the "cell sentence" generated from scRNA-seq data.
|
17 |
+
|
18 |
+
# Cell2Sentence Links
|
19 |
+
- GitHub: https://github.com/vandijklab/cell2sentence
|
20 |
+
- Paper: https://www.biorxiv.org/content/10.1101/2023.09.11.557287v3
|
21 |
+
|
22 |
+
# Pythia Links
|
23 |
+
- Paper: https://arxiv.org/pdf/2304.01373
|
24 |
+
- Hugging Face: https://huggingface.co/EleutherAI/pythia-410m
|