Yin Fang
Update README.md
8d17a51 verified
metadata
license: mit

πŸ—žοΈ Model description

InstructCell is a multi-modal AI copilot that integrates natural language with single-cell RNA sequencing data, enabling researchers to perform tasks like cell type annotation, pseudo-cell generation, and drug sensitivity prediction through intuitive text commands. By leveraging a specialized multi-modal architecture and our multi-modal single-cell instruction dataset, InstructCell reduces technical barriers and enhances accessibility for single-cell analysis.

Instruct Version: Supports generating only the answer portion without additional explanatory text, providing concise and task-specific outputs.

πŸš€ How to use

We provide a simple example for quick reference. This demonstrates a basic cell type annotation workflow.

Make sure to specify the paths for H5AD_PATH and GENE_VOCAB_PATH appropriately:

  • H5AD_PATH: Path to your .h5ad single-cell data file (e.g., H5AD_PATH = "path/to/your/data.h5ad").
  • GENE_VOCAB_PATH: Path to your gene vocabulary file (e.g., GENE_VOCAB_PATH = "path/to/your/gene_vocab.npy").
from mmllm.module import InstructCell
import anndata
import numpy as np
from utils import unify_gene_features

# Load the pre-trained InstructCell model from HuggingFace
model = InstructCell.from_pretrained("zjunlp/InstructCell-instruct")

# Load the single-cell data (H5AD format) and gene vocabulary file (numpy format)
adata = anndata.read_h5ad(H5AD_PATH)
gene_vocab = np.load(GENE_VOCAB_PATH)
adata = unify_gene_features(adata, gene_vocab, force_gene_symbol_uppercase=False)

# Select a random single-cell sample and extract its gene counts and metadata
k = np.random.randint(0, len(adata)) 
gene_counts = adata[k, :].X.toarray()
sc_metadata = adata[k, :].obs.iloc[0].to_dict()

# Define the model prompt with placeholders for metadata and gene expression profile
prompt = (
    "Can you help me annotate this single cell from a {species}? " 
    "It was sequenced using {sequencing_method} and is derived from {tissue}. " 
    "The gene expression profile is {input}. Thanks!"
)

# Use the model to generate predictions
for key, value in model.predict(
    prompt, 
    gene_counts=gene_counts, 
    sc_metadata=sc_metadata, 
    do_sample=True, 
    top_p=0.95,
    top_k=50,
    max_new_tokens=256,
).items():
    # Print each key-value pair
    print(f"{key}: {value}")

For more detailed explanations and additional examples, please refer to the Jupyter notebook demo.ipynb.

πŸ”– Citation

If you use the code or data, please cite the following paper:

@article{fang2025instructcell,
  title={A Multi-Modal AI Copilot for Single-Cell Analysis with Instruction Following},
  author={Fang, Yin and Deng, Xinle and Liu, Kangwei and Zhang, Ningyu and Qian, Jingyang and Yang, Penghui and Fan, Xiaohui and Chen, Huajun},
  journal={arXiv preprint arXiv:2501.08187},
  year={2025}
}