license: mit
SciMult
SciMult is a pre-trained language model for scientific literature understanding. It is pre-trained on data from (extreme multi-label) paper classification, citation prediction, and literature retrieval tasks via a multi-task contrastive learning framework. For more details, please refer to the paper.
We release four variants of SciMult here:
scimult_vanilla.ckpt
scimult_moe.ckpt
scimult_moe_pmcpatients_par.ckpt
scimult_moe_pmcpatients_ppr.ckpt
scimult_vanilla.ckpt and scimult_moe.ckpt can be used for various scientific literature understanding tasks. Their difference is that scimult_vanilla.ckpt adopts a typical 12-layer Transformer architecture (i.e., the same as BERT base), whereas scimult_moe.ckpt adopts a Mixture-of-Experts Transformer architecture with task-specific multi-head attention (MHA) sublayers. Experimental results show that scimult_moe.ckpt achieves better performance in general.
scimult_moe_pmcpatients_par.ckpt and scimult_moe_pmcpatients_ppr.ckpt are initialized from scimult_moe.ckpt and continuously pre-trained on the training sets of PMC-Patients patient-to-article retrieval and patient-to-patient retrieval tasks, respectively. As of December 2023, these two models rank 1st and 2nd in their corresponding tasks, respectively, on the PMC-Patients Leaderboard.
Pre-training Data
SciMult is pre-trained on the following data:
MAPLE for paper classification
Citation Prediction Triplets for link prediction
SciRepEval-Search for literature retrieval
Citation
If you find SciMult useful in your research, please cite the following paper:
@inproceedings{zhang2023pre,
title={Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding},
author={Zhang, Yu and Cheng, Hao and Shen, Zhihong and Liu, Xiaodong and Wang, Ye-Yi and Gao, Jianfeng},
booktitle={Findings of EMNLP'23},
pages={12259--12275},
year={2023}
}