metadata

language:
  - en
pipeline_tag: feature-extraction
tags:
  - pytorch
  - RoBERTa

Model Card for SynCSE-scratch

Model Details

Model Description

More information needed

Developed by: SJTU-LIT
Shared by [Optional]: SJTU-LIT
Model type: Feature Extraction
Language(s) (NLP): More information needed
License: More information needed
Parent Model: RoBERTa-base
Resources for more information: - GitHub Repo - Associated Paper

Uses

Direct Use

This model can be used for the task of feature extraction.

Out-of-Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Data

The model craters note in the Github Repository

We use 27.5k generated synthetic train SynCSE-sractch-RoBERTa-base.

Citation

BibTeX:

@article{zhang2023contrastive,
  title={Contrastive Learning of Sentence Embeddings from Scratch},
  author={Zhang, Junlei and Lan, Zhenzhong and He, Junxian},
  journal={arXiv preprint arXiv:2305.15077},
  year={2023}
}

Model Card Contact

If you have any questions related to the code or the paper, feel free to email Junlei (zhangjunlei@westlake.edu.cn). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand

from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")
model = AutoModel.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")