dflevine13 commited on
Commit
5fb0b4c
1 Parent(s): a09ef8b

added details about inference example

Browse files
Files changed (1) hide show
  1. README.md +7 -2
README.md CHANGED
@@ -19,7 +19,8 @@ Hugging Face: <https://huggingface.co/EleutherAI/pythia-160m>
19
  Cell2Sentence is a novel method for adapting large language models to single-cell transcriptomics.
20
  We transform single-cell RNA sequencing data into sequences of gene names ordered by expression level, termed "cell sentences".
21
  For more details, we refer to the paper linked above.
22
- This model is trained on the immune tissue dataset from [Domínguez et al.](https://www.science.org/doi/10.1126/science.abl5197) on the following tasks:
 
23
  1. conditional cell generation
24
  2. unconditional cell generation
25
  3. cell type prediction
@@ -27,6 +28,8 @@ This model is trained on the immune tissue dataset from [Domínguez et al.](http
27
  # Sample Code
28
 
29
  We provide an example of how to use the model to conditionally generate a cell equipped with a post-processing function to remove duplicate and invalid genes.
 
 
30
  Unconditional cell generation and cell type prediction prompts are included as well, but we do not include an example cell sentence to format the prompt.
31
  We refer to the paper and GitHub repository for instructions on how to transform expression vectors into cell sentences.
32
 
@@ -101,7 +104,9 @@ ccg = f"Enumerate the genes in a {cell_type} cell with nonzero expression, from
101
  # Prompts for other forms a generation.
102
  # ucg = "Display a cell's genes by expression level, in descending order."
103
  # cellsentence = "CELL_SENTENCE"
104
- # ctp = f"Identify the cell type most likely associated with these highly expressed genes listed in descending order. {cellsentence} Name the cell type connected to these genes, ranked from highest to lowest expression."
 
 
105
 
106
  tokens = tokenizer(ccg, return_tensors='pt')
107
  input_ids = tokens['input_ids'].to(torch.device("cuda"))
 
19
  Cell2Sentence is a novel method for adapting large language models to single-cell transcriptomics.
20
  We transform single-cell RNA sequencing data into sequences of gene names ordered by expression level, termed "cell sentences".
21
  For more details, we refer to the paper linked above.
22
+ This model was trained on the immune tissue dataset from [Domínguez et al.](https://www.science.org/doi/10.1126/science.abl5197) using 8 A100 40GB GPUs
23
+ on the following tasks:
24
  1. conditional cell generation
25
  2. unconditional cell generation
26
  3. cell type prediction
 
28
  # Sample Code
29
 
30
  We provide an example of how to use the model to conditionally generate a cell equipped with a post-processing function to remove duplicate and invalid genes.
31
+ In order to generate full cells, the `max_length` generation parameter should be changed to 9200.
32
+ However, we recommend using an A100 GPU for inference speed and memory capacity if full cell generation is required.
33
  Unconditional cell generation and cell type prediction prompts are included as well, but we do not include an example cell sentence to format the prompt.
34
  We refer to the paper and GitHub repository for instructions on how to transform expression vectors into cell sentences.
35
 
 
104
  # Prompts for other forms a generation.
105
  # ucg = "Display a cell's genes by expression level, in descending order."
106
  # cellsentence = "CELL_SENTENCE"
107
+ # ctp = "Identify the cell type most likely associated with these highly expressed genes listed in descending order. "
108
+ # + cellsentence +
109
+ # "Name the cell type connected to these genes, ranked from highest to lowest expression."
110
 
111
  tokens = tokenizer(ccg, return_tensors='pt')
112
  input_ids = tokens['input_ids'].to(torch.device("cuda"))