---
pipeline_tag: sentence-similarity
tags:
- sentence-transformers
- sentence-similarity
datasets:
- Baiming123/MeSHDS
base_model:
- sentence-transformers/multi-qa-MiniLM-L6-cos-v1
---

# Model Description

This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 384 dimensional dense vector space and can be used for tasks like clustering or semantic search.The 'Calcu_Disease_Similarity' model is designed to encode two disease terms and compute their **semantic similarity**. The model has been fine-tuned on disease-related datasets 'MeSHDS' and achieves a high F1 score in distinguishing experimentally validated miRNA-target interactions (MTIs) and predicted MTIs by considering disease similarity.

If you use this model in your research, please cite the following paper:
```
@article {Chen2024.05.17.594604,
	author = {Chen, Baiming},
	title = {Refining Protein-Level MicroRNA Target Interactions in Disease from Prediction Databases Using Sentence-BERT},
	elocation-id = {2024.05.17.594604},
	year = {2024},
	doi = {10.1101/2024.05.17.594604},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2024/09/18/2024.05.17.594604},
	eprint = {https://www.biorxiv.org/content/early/2024/09/18/2024.05.17.594604.full.pdf},
	journal = {bioRxiv}
}
```

## Key Features:
- Fine-tuned to compute semantic similarity between disease names.
- Achieves an F1 score of 0.88 in distinguishing protein-level experimentally (western blot, reporter assay) validated MTIs and predicted MTIs.
- Built for applications in understanding miRNA-gene regulatory networks, disease diagnosis, treatment, and drug discovery.

## Full Model Architecture
```
SentenceTransformer(
  (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel 
  (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False})
  (2): Normalize()
)
```

# Usage (Sentence-Transformers)

```
pip install -U sentence-transformers
```

Then you can use the model like this:

Download all the files from the "files and versions" section and create a folder named 'Calcu_Disease_Similarity'. Once you've done that, you can load the model and compute disease similarity as shown below:

```python
# Load the pre-trained SBERT model
from sentence_transformers import SentenceTransformer, util

# Replace 'your/path/to/Calcu_Disease_Similarity' with the actual path to the model
model = SentenceTransformer('your/path/to/Calcu_Disease_Similarity')

# Example usage
disease1 = "lung cancer"
disease2 = "pulmonary fibrosis"

def sts(sentence_a, sentence_b) -> float:

  query_emb = model.encode(sentence_a)
  doc_emb = model.encode(sentence_b)
  [score] = util.dot_score(query_emb, doc_emb)[0].tolist()

  return score

similarity = sts(disease1, disease2)
```

# Additional Information

## License
This model is licensed under CC-BY-NC 4.0 International license. If you use this model, please adhere to the license requirements.

## Questions or Issues
If you encounter any issues or have any questions while using the model, feel free to reach out to the author for assistance. Thank you for your support and for using this model!