shibing624
commited on
Commit
•
c4e9d21
1
Parent(s):
6177599
Update README.md
Browse files
README.md
CHANGED
@@ -136,10 +136,34 @@ print(sentence_embeddings)
|
|
136 |
## Full Model Architecture
|
137 |
```
|
138 |
CoSENT(
|
139 |
-
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model:
|
140 |
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True})
|
141 |
)
|
142 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
143 |
## Citing & Authors
|
144 |
This model was trained by [text2vec](https://github.com/shibing624/text2vec).
|
145 |
|
|
|
136 |
## Full Model Architecture
|
137 |
```
|
138 |
CoSENT(
|
139 |
+
(0): Transformer({'max_seq_length': 256, 'do_lower_case': False}) with Transformer model: ErnieModel
|
140 |
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_mean_tokens': True})
|
141 |
)
|
142 |
```
|
143 |
+
|
144 |
+
|
145 |
+
## Intended uses
|
146 |
+
|
147 |
+
Our model is intented to be used as a sentence and short paragraph encoder. Given an input text, it ouptuts a vector which captures
|
148 |
+
the semantic information. The sentence vector may be used for information retrieval, clustering or sentence similarity tasks.
|
149 |
+
|
150 |
+
By default, input text longer than 256 word pieces is truncated.
|
151 |
+
|
152 |
+
|
153 |
+
## Training procedure
|
154 |
+
|
155 |
+
### Pre-training
|
156 |
+
|
157 |
+
We use the pretrained [`nghuyong/ernie-3.0-base-zh`](https://huggingface.co/nghuyong/ernie-3.0-base-zh) model.
|
158 |
+
Please refer to the model card for more detailed information about the pre-training procedure.
|
159 |
+
|
160 |
+
### Fine-tuning
|
161 |
+
|
162 |
+
We fine-tune the model using a contrastive objective. Formally, we compute the cosine similarity from each
|
163 |
+
possible sentence pairs from the batch.
|
164 |
+
We then apply the rank loss by comparing with true pairs and false pairs.
|
165 |
+
|
166 |
+
|
167 |
## Citing & Authors
|
168 |
This model was trained by [text2vec](https://github.com/shibing624/text2vec).
|
169 |
|