leoozy's picture
Create README.md
cedee7a
metadata
language:
  - en
pipeline_tag: feature-extraction
tags:
  - pytorch
  - RoBERTa

Model Card for SynCSE-scratch

Model Details

Model Description

More information needed

  • Developed by: SJTU-LIT

  • Shared by [Optional]: SJTU-LIT

  • Model type: Feature Extraction

  • Language(s) (NLP): More information needed

  • License: More information needed

  • Parent Model: RoBERTa-base

  • Resources for more information: - GitHub Repo - Associated Paper

Uses

Direct Use

This model can be used for the task of feature extraction.

Out-of-Scope Use

The model should not be used to intentionally create hostile or alienating environments for people.

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Recommendations

Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.

Training Data

The model craters note in the Github Repository

We use 27.5k generated synthetic train SynCSE-sractch-RoBERTa-base.

Citation

BibTeX:

@article{zhang2023contrastive,
  title={Contrastive Learning of Sentence Embeddings from Scratch},
  author={Zhang, Junlei and Lan, Zhenzhong and He, Junxian},
  journal={arXiv preprint arXiv:2305.15077},
  year={2023}
}

Model Card Contact

If you have any questions related to the code or the paper, feel free to email Junlei (zhangjunlei@westlake.edu.cn). If you encounter any problems when using the code, or want to report a bug, you can open an issue. Please try to specify the problem with details so we can help you better and quicker!

How to Get Started with the Model

Use the code below to get started with the model.

Click to expand
from transformers import AutoTokenizer, AutoModel
tokenizer = AutoTokenizer.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")
model = AutoModel.from_pretrained("sjtu-lit/SynCSE-partial-RoBERTa-base")