Model Description

This is a sentence-transformers model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.The 'Calcu_Disease_Similarity' model is designed to encode two disease terms and compute their semantic similarity. The model has been fine-tuned on disease-related datasets 'MeSHDS' and achieves a high F1 score in distinguishing experimentally validated miRNA-target interactions (MTIs) and predicted MTIs by considering disease similarity.

If you use this model in your research, please cite the following paper:

@article {Chen2024.05.17.594604,
    author = {Chen, Baiming},
    title = {miRTarDS: High-Accuracy Refining Protein-level MicroRNA Target Interactions from Prediction Databases Using Sentence-BERT},
    elocation-id = {2024.05.17.594604},
    year = {2024},
    doi = {10.1101/2024.05.17.594604},
    publisher = {Cold Spring Harbor Laboratory},
    abstract = {MicroRNAs (miRNAs) regulate gene expression by binding to mRNAs, inhibiting translation, or promoting mRNA degradation. miRNAs are of great importance in the development of various diseases. Currently, numerous sequence-based miRNA target prediction tools are available, however, only 1\% of their predictions have been experimentally validated. In this study, we propose a novel approach that leverages disease similarity between miRNAs and genes as a key feature to further refine and screen human sequence-based predicted miRNA target interactions (MTIs). To quantify the semantic similarity of diseases, we fine-tuned the Sentence-BERT model. Our method achieved an F1 score of 0.88 in accurately distinguishing human protein-level experimentally validated MTIs (functional MTIs, validated through western blot or reporter assay) and predicted MTIs. Moreover, this method exhibits exceptional generalizability across different databases. We applied the proposed method to analyze 1,220,904 human MTIs sourced from miRTarbase, miRDB, and miRWalk, encompassing 6,085 genes and 1,261 pre-miRNAs. Notably, we accurately identified 3,883 out of 3,962 MTIs with strong experimental evidence from miRTarbase. This study has the potential to provide valuable insights into the understanding of miRNA-gene regulatory networks and to promote advancements in disease diagnosis, treatment, and drug development.Competing Interest StatementThe authors have declared no competing interest.},
    URL = {https://www.biorxiv.org/content/early/2024/12/08/2024.05.17.594604},
    eprint = {https://www.biorxiv.org/content/early/2024/12/08/2024.05.17.594604.full.pdf},
    journal = {bioRxiv}
}

Key Features:

Fine-tuned to compute semantic similarity between disease names.
Achieves an F1 score of 0.88 in distinguishing protein-level experimentally (western blot, reporter assay) validated MTIs and predicted MTIs.
Built for applications in understanding miRNA-gene regulatory networks, disease diagnosis, treatment, and drug discovery.

Full Model Architecture

SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)

Usage (Sentence-Transformers)

pip install -U sentence-transformers

Then you can use the model like this:

# Load the pre-trained SBERT model
from sentence_transformers import SentenceTransformer, util

# Replace 'your/path/to/Calcu_Disease_Similarity' with the actual path to the model
model = SentenceTransformer("Baiming123/Calcu_Disease_Similarity")

# Example usage
disease1 = "lung cancer"
disease2 = "pulmonary fibrosis"

def sts(sentence_a, sentence_b) -> float:

  query_emb = model.encode(sentence_a)
  doc_emb = model.encode(sentence_b)
  [score] = util.dot_score(query_emb, doc_emb)[0].tolist()

  return score

similarity = sts(disease1, disease2)
print(similarity)

Additional Information

License

This model is licensed under CC-BY-NC 4.0 International license. If you use this model, please adhere to the license requirements.

Questions or Issues

If you encounter any issues or have any questions while using the model, feel free to reach out to the author for assistance. Thank you for your support and for using this model!

Baiming123
/

Calcu_Disease_Similarity

You need to agree to share your contact information to access this model