metadata

license: mit

SciMult

SciMult is a pre-trained language model for scientific literature understanding. It is pre-trained on data from (extreme multi-label) paper classification, citation prediction, and literature retrieval tasks via a multi-task contrastive learning framework. For more details, please refer to the paper.

We release four variants of SciMult here:
scimult_vanilla.ckpt
scimult_moe.ckpt
scimult_moe_pmcpatients_par.ckpt
scimult_moe_pmcpatients_ppr.ckpt

scimult_vanilla.ckpt and scimult_moe.ckpt can be used for various scientific literature understanding tasks. Their difference is that scimult_vanilla.ckpt adopts a typical 12-layer Transformer architecture (i.e., the same as BERT base), whereas scimult_moe.ckpt adopts a Mixture-of-Experts Transformer architecture with task-specific multi-head attention (MHA) sublayers. Experimental results show that scimult_moe.ckpt achieves better performance in general.

scimult_moe_pmcpatients_par.ckpt and scimult_moe_pmcpatients_ppr.ckpt are initialized from scimult_moe.ckpt and continuously pre-trained on the training sets of PMC-Patients patient-to-article retrieval and patient-to-patient retrieval tasks, respectively. As of December 2023, these two models rank 1st and 2nd in their corresponding tasks, respectively, on the PMC-Patients Leaderboard.

Pre-training Data

SciMult is pre-trained on the following data:
MAPLE for paper classification
Citation Prediction Triplets for link prediction
SciRepEval-Search for literature retrieval

Citation

If you find SciMult useful in your research, please cite the following paper:

@inproceedings{zhang2023pre,
  title={Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding},
  author={Zhang, Yu and Cheng, Hao and Shen, Zhihong and Liu, Xiaodong and Wang, Ye-Yi and Gao, Jianfeng},
  booktitle={Findings of EMNLP'23},
  pages={12259--12275},
  year={2023}
}