language:
- en
pipeline_tag: feature-extraction
tags:
- pytorch
- RoBERTa
Model Card for SynCSE-scratch
Model Details
Model Description
More information needed
Developed by: SJTU-LIT
Shared by [Optional]: SJTU-LIT
Model type: Feature Extraction
Language(s) (NLP): More information needed
License: More information needed
Parent Model: RoBERTa-base
Resources for more information: - GitHub Repo - Associated Paper
Uses
Direct Use
This model can be used for the task of feature extraction.
Out-of-Scope Use
The model should not be used to intentionally create hostile or alienating environments for people.
Bias, Risks, and Limitations
Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
Training Data
The model craters note in the Github Repository
We use 27.5k generated synthetic train SynCSE-sractch-RoBERTa-base.
Citation
BibTeX:
@article{zhang2023contrastive,
title={Contrastive Learning of Sentence Embeddings from Scratch},
author={Zhang, Junlei and Lan, Zhenzhong and He, Junxian},
journal={arXiv preprint arXiv:2305.15077},
year={2023}
}
Model Card Contact
If you have any questions related to the code or the paper, feel free to email Junlei (zhangjunlei@westlake.edu.cn
). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!
How to Get Started with the Model
Use the code below to get started with the model.
Click to expand
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")
model = AutoModel.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")