Size of tensor a does not match size of tensor b in InSilicoPerturbation

#184
by junguyen - opened

Thank you for making this model available!

I have had previous success running InSilicoPerturber on a subsetted Genecorpus-30M dataset composed of ~2700 cells, using the same parameters (I am simply using the Genecorpus-30M dataset to test how the InSilicoPerturbation functions works). However, after changes were pushed on Aug 2, 2023 to fix the attention mask issue, I now receive the following error:

isp = InSilicoPerturber(perturb_type="delete",
                        perturb_rank_shift=None,
                        genes_to_perturb=["ENSG00000135100"],
                        combos=0,
                        anchor_gene=None,
                        model_type="Pretrained",
                        num_classes=0,
                        emb_mode="cell",
                        cell_emb_style="mean_pool",
                        filter_data=None,
                        cell_states_to_model=None,
                        max_ncells=None,
                        emb_layer=-1,
                        forward_batch_size=50,
                        nproc=16,
                        token_dictionary_file = "/home/ubuntu/Geneformer/geneformer/token_dictionary.pkl")

isp.perturb_data("/home/ubuntu/Geneformer",
                 "/data/subset_genecorpus/",
                 "/data/subset_genecorpus/delete_cell/",
                 "delete_cell_HNF1A")
Filter (num_proc=16): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 2741/2741 [00:12<00:00, 214.74 examples/s]
Map (num_proc=16): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 37/37 [00:12<00:00,  2.92 examples/s]
Map (num_proc=16): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 37/37 [00:00<00:00, 159.23 examples/s]
Map (num_proc=16): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 37/37 [00:00<00:00, 166.24 examples/s]
Map (num_proc=16): 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 37/37 [00:00<00:00, 165.63 examples/s]
Traceback (most recent call last):                                            
  File "<stdin>", line 1, in <module>                                         
  File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 974, 
in perturb_data                                                               
    self.in_silico_perturb(model,                                             
  File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 1052,
 in in_silico_perturb                                                         
    cos_sims_data = quant_cos_sims(model,                                     
  File "/home/ubuntu/Geneformer/geneformer/in_silico_perturber.py", line 444, 
in quant_cos_sims                                                             
    cos_sims += [cos(minibatch_emb, minibatch_comparison).to("cpu")]          
  File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/module.p
y", line 1501, in _call_impl                                                  
    return forward_call(*args, **kwargs)                                      
  File "/opt/tensorflow/lib/python3.10/site-packages/torch/nn/modules/distance
.py", line 87, in forward                                                     
    return F.cosine_similarity(x1, x2, self.dim, self.eps)                    
RuntimeError: The size of tensor a (2047) must match the size of tensor b (204
6) at non-singleton dimension 1                                               

I've referenced Discussion #85 to help with this issue; however changing the batch
size to 200 still raises the same error. I also have the latest version of Geneformer pulled.

Could I get some help with why this error is now raising? Thank you!

Hi there, thanks for bringing this issue up! We've just updated the code to address the issue. Thanks for your interest!

ctheodoris changed discussion status to closed

Sign up or log in to comment