dflevine13
commited on
Commit
•
5fb0b4c
1
Parent(s):
a09ef8b
added details about inference example
Browse files
README.md
CHANGED
@@ -19,7 +19,8 @@ Hugging Face: <https://huggingface.co/EleutherAI/pythia-160m>
|
|
19 |
Cell2Sentence is a novel method for adapting large language models to single-cell transcriptomics.
|
20 |
We transform single-cell RNA sequencing data into sequences of gene names ordered by expression level, termed "cell sentences".
|
21 |
For more details, we refer to the paper linked above.
|
22 |
-
This model
|
|
|
23 |
1. conditional cell generation
|
24 |
2. unconditional cell generation
|
25 |
3. cell type prediction
|
@@ -27,6 +28,8 @@ This model is trained on the immune tissue dataset from [Domínguez et al.](http
|
|
27 |
# Sample Code
|
28 |
|
29 |
We provide an example of how to use the model to conditionally generate a cell equipped with a post-processing function to remove duplicate and invalid genes.
|
|
|
|
|
30 |
Unconditional cell generation and cell type prediction prompts are included as well, but we do not include an example cell sentence to format the prompt.
|
31 |
We refer to the paper and GitHub repository for instructions on how to transform expression vectors into cell sentences.
|
32 |
|
@@ -101,7 +104,9 @@ ccg = f"Enumerate the genes in a {cell_type} cell with nonzero expression, from
|
|
101 |
# Prompts for other forms a generation.
|
102 |
# ucg = "Display a cell's genes by expression level, in descending order."
|
103 |
# cellsentence = "CELL_SENTENCE"
|
104 |
-
# ctp =
|
|
|
|
|
105 |
|
106 |
tokens = tokenizer(ccg, return_tensors='pt')
|
107 |
input_ids = tokens['input_ids'].to(torch.device("cuda"))
|
|
|
19 |
Cell2Sentence is a novel method for adapting large language models to single-cell transcriptomics.
|
20 |
We transform single-cell RNA sequencing data into sequences of gene names ordered by expression level, termed "cell sentences".
|
21 |
For more details, we refer to the paper linked above.
|
22 |
+
This model was trained on the immune tissue dataset from [Domínguez et al.](https://www.science.org/doi/10.1126/science.abl5197) using 8 A100 40GB GPUs
|
23 |
+
on the following tasks:
|
24 |
1. conditional cell generation
|
25 |
2. unconditional cell generation
|
26 |
3. cell type prediction
|
|
|
28 |
# Sample Code
|
29 |
|
30 |
We provide an example of how to use the model to conditionally generate a cell equipped with a post-processing function to remove duplicate and invalid genes.
|
31 |
+
In order to generate full cells, the `max_length` generation parameter should be changed to 9200.
|
32 |
+
However, we recommend using an A100 GPU for inference speed and memory capacity if full cell generation is required.
|
33 |
Unconditional cell generation and cell type prediction prompts are included as well, but we do not include an example cell sentence to format the prompt.
|
34 |
We refer to the paper and GitHub repository for instructions on how to transform expression vectors into cell sentences.
|
35 |
|
|
|
104 |
# Prompts for other forms a generation.
|
105 |
# ucg = "Display a cell's genes by expression level, in descending order."
|
106 |
# cellsentence = "CELL_SENTENCE"
|
107 |
+
# ctp = "Identify the cell type most likely associated with these highly expressed genes listed in descending order. "
|
108 |
+
# + cellsentence +
|
109 |
+
# "Name the cell type connected to these genes, ranked from highest to lowest expression."
|
110 |
|
111 |
tokens = tokenizer(ccg, return_tensors='pt')
|
112 |
input_ids = tokens['input_ids'].to(torch.device("cuda"))
|