How to use fine-tuned models to predict new datasets

#107
by simplern - opened

Hello Geneformer developers and users.
I am new to Geneformer and transformer.
I got a cell classification model by fine-tuning Geneformer, how can I deploy the model to make cell label predictions on new datasets, for example on previously unused test sets, could you provide a code example?

Thank you for your interest in Geneformer! One way to use the model to make predictions on new datasets is by the trainer.predict method, which is demonstrated in the cell classification example notebook (please refer to the last 5 lines of the notebook). Another way is to load the model and obtain the predictions from the model outputs. Please see the "classifier_predict" function defined in the gene classification example notebook.

ctheodoris changed discussion status to closed

@ctheodoris
I use the fine-tuned model to predict a new dataset by Trainer.
When the length is inconsistent, the prediction will report an error, do you have any suggestions, the code is as follows:

train_dataset=load_from_disk("/scRNA/Geneformer/example_input_file/cell_classification/disease_classification/human_dcm_hcm_nf.dataset/")
train_dataset = train_dataset.shuffle(seed=42).select(i for i in range(100))
set(train_dataset['disease'])# {'dcm', 'hcm', 'nf'}
test_dataset = train_dataset.remove_columns(['individual', 'age', 'sex',  'lvef','cell_type','disease'])

model=BertForSequenceClassification.from_pretrained("/scRNA/Geneformer/checkpoint/cell_disease/230627_geneformer_CellClassifier_L2048_B4_LR5e-05_LSlinear_WU500_E10_Oadamw_F0/", ).to("cpu")
trainer = Trainer(model=model)
predictions = trainer.predict(test_dataset)
pre = predictions.predictions

Error is:
ValueError: expected sequence of length 794 at dim 1 (got 432)

Input dataset to dataframe like:

image.png

Thank you for your question. Batches of tensors need to be padded to be the same length.

Thank you for responding, just like preprocess_classifier_batch function defined in the gene classification example notebook?

Yes, that's right!

Sign up or log in to comment