Pre-trained corpus processing consulting

#306
by babykai - opened

Thank you so much for Geneformer, a great tool for analyzing single-cell data!
The pre-processing details of the pre-trained corpus are not seen in the Geneformer paper. You collect a large amount of single cell transcriptome data and would like to ask how you put together the single cell data in fastq or matrix format as a pre-training corpus for geneformer. Whether traditional treatment steps are required (such as mitochondrial gene percentage control, batch effect removal, etc.). Thanks again!

Thank you for your question! All preprocessing details are in the manuscript methods section Assembly and uniform processing of single-cell transcriptomes. We collected raw counts matrices from the original studies and processed them as indicated in the methods. We did not perform batch effect removal.

ctheodoris changed discussion status to closed

Sign up or log in to comment