CocoRoF commited on
Commit
4d0f531
·
verified ·
1 Parent(s): 3a0377c

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +264 -157
README.md CHANGED
@@ -1,199 +1,306 @@
1
  ---
2
- library_name: transformers
3
- tags: []
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4
  ---
5
 
6
- # Model Card for Model ID
7
-
8
- <!-- Provide a quick summary of what the model is/does. -->
9
-
10
 
 
11
 
12
  ## Model Details
13
 
14
  ### Model Description
 
 
 
 
 
 
 
15
 
16
- <!-- Provide a longer summary of what this model is. -->
17
-
18
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
19
-
20
- - **Developed by:** [More Information Needed]
21
- - **Funded by [optional]:** [More Information Needed]
22
- - **Shared by [optional]:** [More Information Needed]
23
- - **Model type:** [More Information Needed]
24
- - **Language(s) (NLP):** [More Information Needed]
25
- - **License:** [More Information Needed]
26
- - **Finetuned from model [optional]:** [More Information Needed]
27
-
28
- ### Model Sources [optional]
29
-
30
- <!-- Provide the basic links for the model. -->
31
-
32
- - **Repository:** [More Information Needed]
33
- - **Paper [optional]:** [More Information Needed]
34
- - **Demo [optional]:** [More Information Needed]
35
-
36
- ## Uses
37
-
38
- <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
39
-
40
- ### Direct Use
41
-
42
- <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
43
-
44
- [More Information Needed]
45
-
46
- ### Downstream Use [optional]
47
-
48
- <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
49
-
50
- [More Information Needed]
51
-
52
- ### Out-of-Scope Use
53
-
54
- <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
55
-
56
- [More Information Needed]
57
-
58
- ## Bias, Risks, and Limitations
59
-
60
- <!-- This section is meant to convey both technical and sociotechnical limitations. -->
61
-
62
- [More Information Needed]
63
-
64
- ### Recommendations
65
 
66
- <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
 
 
67
 
68
- Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
69
 
70
- ## How to Get Started with the Model
 
 
 
 
 
 
71
 
72
- Use the code below to get started with the model.
73
 
74
- [More Information Needed]
75
 
76
- ## Training Details
77
 
78
- ### Training Data
 
 
79
 
80
- <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
 
 
81
 
82
- [More Information Needed]
 
 
 
 
 
 
 
 
 
 
83
 
84
- ### Training Procedure
 
 
 
 
85
 
86
- <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
 
87
 
88
- #### Preprocessing [optional]
89
 
90
- [More Information Needed]
 
91
 
 
 
92
 
93
- #### Training Hyperparameters
94
 
95
- - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
96
 
97
- #### Speeds, Sizes, Times [optional]
 
98
 
99
- <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
 
100
 
101
- [More Information Needed]
 
102
 
103
  ## Evaluation
104
 
105
- <!-- This section describes the evaluation protocols and provides the results. -->
106
-
107
- ### Testing Data, Factors & Metrics
108
-
109
- #### Testing Data
110
-
111
- <!-- This should link to a Dataset Card if possible. -->
112
-
113
- [More Information Needed]
114
-
115
- #### Factors
116
-
117
- <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
118
 
119
- [More Information Needed]
120
 
121
- #### Metrics
 
122
 
123
- <!-- These are the evaluation metrics being used, ideally with a description of why. -->
 
 
 
 
 
 
 
 
 
 
 
124
 
125
- [More Information Needed]
 
126
 
127
- ### Results
 
128
 
129
- [More Information Needed]
130
-
131
- #### Summary
132
-
133
-
134
-
135
- ## Model Examination [optional]
136
-
137
- <!-- Relevant interpretability work for the model goes here -->
138
-
139
- [More Information Needed]
140
-
141
- ## Environmental Impact
142
-
143
- <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
144
-
145
- Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
146
-
147
- - **Hardware Type:** [More Information Needed]
148
- - **Hours used:** [More Information Needed]
149
- - **Cloud Provider:** [More Information Needed]
150
- - **Compute Region:** [More Information Needed]
151
- - **Carbon Emitted:** [More Information Needed]
152
-
153
- ## Technical Specifications [optional]
154
-
155
- ### Model Architecture and Objective
156
-
157
- [More Information Needed]
158
-
159
- ### Compute Infrastructure
160
-
161
- [More Information Needed]
162
-
163
- #### Hardware
164
-
165
- [More Information Needed]
166
-
167
- #### Software
168
-
169
- [More Information Needed]
170
-
171
- ## Citation [optional]
172
-
173
- <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
174
-
175
- **BibTeX:**
176
-
177
- [More Information Needed]
178
-
179
- **APA:**
180
-
181
- [More Information Needed]
182
-
183
- ## Glossary [optional]
184
-
185
- <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
186
-
187
- [More Information Needed]
188
-
189
- ## More Information [optional]
190
-
191
- [More Information Needed]
192
 
193
- ## Model Card Authors [optional]
 
194
 
195
- [More Information Needed]
196
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
197
  ## Model Card Contact
198
 
199
- [More Information Needed]
 
 
1
  ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:392702
8
+ - loss:CosineSimilarityLoss
9
+ base_model: answerdotai/ModernBERT-base
10
+ widget:
11
+ - source_sentence: 우리는 움직이는 동행 우주 정지 좌표계에 비례하여 이동하고 있습니다 ... 약 371km / s에서 별자리 leo 쪽으로. "
12
+ sentences:
13
+ - 두 마리의 독수리가 가지에 앉는다.
14
+ - 다른 물체와는 관련이 없는 '정지'는 없다.
15
+ - 소녀는 버스의 열린 문 앞에 서 있다.
16
+ - source_sentence: 숲에는 개들이 있다.
17
+ sentences:
18
+ - 양을 보는 아이들.
19
+ - 여왕의 배우자를 "왕"이라고 부르지 않는 것은 아주 좋은 이유가 있다. 왜냐하면 그들은 왕이 아니기 때문이다.
20
+ - 개들은 숲속에 혼자 있다.
21
+ - source_sentence: '첫째, 두 가지 다른 종류의 대시가 있다는 것을 알아야 합니다 : en 대시와 em 대시.'
22
+ sentences:
23
+ - 그들은 그 물건들을 집 주변에 두고 가거나 집의 정리를 해칠 의도가 없다.
24
+ - 세미콜론은 혼자 있을 수 있는 문장에 참여하는데 사용되지만, 그들의 관계를 강조하기 위해 결합됩니다.
25
+ - 그의 남동생이 지켜보는 동안 집 앞에서 트럼펫을 연주하는 금발의 아이.
26
+ - source_sentence: 한 여성이 생선 껍질을 벗기고 있다.
27
+ sentences:
28
+ - 한 남자가 수영장으로 뛰어들었다.
29
+ - 한 여성이 프라이팬에 노란 혼합물을 부어 넣고 있다.
30
+ - 두 마리의 갈색 개가 눈 속에서 서로 놀고 있다.
31
+ - source_sentence: 버스가 바쁜 길을 따라 운전한다.
32
+ sentences:
33
+ - 우리와 같은 태양계가 은하계 밖에서 존재할 수도 있을 것입니다.
34
+ - 그 여자는 데이트하러 가는 중이다.
35
+ - 녹색 버스가 도로를 따라 내려간다.
36
+ pipeline_tag: sentence-similarity
37
+ library_name: sentence-transformers
38
+ metrics:
39
+ - pearson_cosine
40
+ - spearman_cosine
41
+ - pearson_euclidean
42
+ - spearman_euclidean
43
+ - pearson_manhattan
44
+ - spearman_manhattan
45
+ - pearson_dot
46
+ - spearman_dot
47
+ - pearson_max
48
+ - spearman_max
49
+ model-index:
50
+ - name: SentenceTransformer based on answerdotai/ModernBERT-base
51
+ results:
52
+ - task:
53
+ type: semantic-similarity
54
+ name: Semantic Similarity
55
+ dataset:
56
+ name: sts dev
57
+ type: sts_dev
58
+ metrics:
59
+ - type: pearson_cosine
60
+ value: 0.8273878707711191
61
+ name: Pearson Cosine
62
+ - type: spearman_cosine
63
+ value: 0.8298080691919564
64
+ name: Spearman Cosine
65
+ - type: pearson_euclidean
66
+ value: 0.8112987734110177
67
+ name: Pearson Euclidean
68
+ - type: spearman_euclidean
69
+ value: 0.8214596205940881
70
+ name: Spearman Euclidean
71
+ - type: pearson_manhattan
72
+ value: 0.8125188338482303
73
+ name: Pearson Manhattan
74
+ - type: spearman_manhattan
75
+ value: 0.8226861322419045
76
+ name: Spearman Manhattan
77
+ - type: pearson_dot
78
+ value: 0.7646820898603437
79
+ name: Pearson Dot
80
+ - type: spearman_dot
81
+ value: 0.7648333772102188
82
+ name: Spearman Dot
83
+ - type: pearson_max
84
+ value: 0.8273878707711191
85
+ name: Pearson Max
86
+ - type: spearman_max
87
+ value: 0.8298080691919564
88
+ name: Spearman Max
89
  ---
90
 
91
+ # SentenceTransformer based on answerdotai/ModernBERT-base
 
 
 
92
 
93
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) on the [korean_nli_dataset](https://huggingface.co/datasets/x2bee/Korean_NLI_dataset) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
94
 
95
  ## Model Details
96
 
97
  ### Model Description
98
+ - **Model Type:** Sentence Transformer
99
+ - **Base model:** [answerdotai/ModernBERT-base](https://huggingface.co/answerdotai/ModernBERT-base) <!-- at revision addb15798678d7f76904915cf8045628d402b3ce -->
100
+ - **Maximum Sequence Length:** 512 tokens
101
+ - **Output Dimensionality:** 768 dimensions
102
+ - **Similarity Function:** Cosine Similarity
103
+ <!-- - **Language:** Unknown -->
104
+ <!-- - **License:** Unknown -->
105
 
106
+ ### Model Sources
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
107
 
108
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
109
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
110
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
111
 
112
+ ### Full Model Architecture
113
 
114
+ ```
115
+ SentenceTransformer(
116
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: ModernBertModel
117
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': True, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
118
+ (2): Dense({'in_features': 768, 'out_features': 768, 'bias': True, 'activation_function': 'torch.nn.modules.activation.Tanh'})
119
+ )
120
+ ```
121
 
122
+ ## Usage
123
 
124
+ ### Direct Usage (Sentence Transformers)
125
 
126
+ First install the Sentence Transformers library:
127
 
128
+ ```bash
129
+ pip install -U sentence-transformers
130
+ ```
131
 
132
+ Then you can load this model and run inference.
133
+ ```python
134
+ from sentence_transformers import SentenceTransformer
135
 
136
+ # Download from the 🤗 Hub
137
+ model = SentenceTransformer("x2bee/sts_nli_tune_test")
138
+ # Run inference
139
+ sentences = [
140
+ '버스가 바쁜 길을 따라 운전한다.',
141
+ '녹색 버스가 도로를 따라 내려간다.',
142
+ '그 여자는 데이트하러 가는 중이다.',
143
+ ]
144
+ embeddings = model.encode(sentences)
145
+ print(embeddings.shape)
146
+ # [3, 768]
147
 
148
+ # Get the similarity scores for the embeddings
149
+ similarities = model.similarity(embeddings, embeddings)
150
+ print(similarities.shape)
151
+ # [3, 3]
152
+ ```
153
 
154
+ <!--
155
+ ### Direct Usage (Transformers)
156
 
157
+ <details><summary>Click to see the direct usage in Transformers</summary>
158
 
159
+ </details>
160
+ -->
161
 
162
+ <!--
163
+ ### Downstream Usage (Sentence Transformers)
164
 
165
+ You can finetune this model on your own dataset.
166
 
167
+ <details><summary>Click to expand</summary>
168
 
169
+ </details>
170
+ -->
171
 
172
+ <!--
173
+ ### Out-of-Scope Use
174
 
175
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
176
+ -->
177
 
178
  ## Evaluation
179
 
180
+ ### Metrics
 
 
 
 
 
 
 
 
 
 
 
 
181
 
182
+ #### Semantic Similarity
183
 
184
+ * Dataset: `sts_dev`
185
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
186
 
187
+ | Metric | Value |
188
+ |:-------------------|:-----------|
189
+ | pearson_cosine | 0.8273 |
190
+ | spearman_cosine | 0.8298 |
191
+ | pearson_euclidean | 0.8112 |
192
+ | spearman_euclidean | 0.8214 |
193
+ | pearson_manhattan | 0.8125 |
194
+ | spearman_manhattan | 0.8226 |
195
+ | pearson_dot | 0.7648 |
196
+ | spearman_dot | 0.7648 |
197
+ | pearson_max | 0.8273 |
198
+ | **spearman_max** | **0.8298** |
199
 
200
+ <!--
201
+ ## Bias, Risks and Limitations
202
 
203
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
204
+ -->
205
 
206
+ <!--
207
+ ### Recommendations
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
208
 
209
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
210
+ -->
211
 
212
+ ## Training Details
213
 
214
+ ### Training Dataset
215
+
216
+ #### korean_nli_dataset
217
+
218
+ * Dataset: [korean_nli_dataset](https://huggingface.co/datasets/x2bee/Korean_NLI_dataset) at [ef305ef](https://huggingface.co/datasets/x2bee/Korean_NLI_dataset/tree/ef305ef8e2d83c6991f30f2322f321efb5a3b9d1)
219
+ * Size: 392,702 training samples
220
+ * Columns: <code>sentence1</code>, <code>sentence2</code>, and <code>score</code>
221
+ * Approximate statistics based on the first 1000 samples:
222
+ | | sentence1 | sentence2 | score |
223
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
224
+ | type | string | string | float |
225
+ | details | <ul><li>min: 4 tokens</li><li>mean: 35.7 tokens</li><li>max: 194 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 19.92 tokens</li><li>max: 64 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.48</li><li>max: 1.0</li></ul> |
226
+ * Samples:
227
+ | sentence1 | sentence2 | score |
228
+ |:----------------------------------------------------------------------------------------------------------------------------------------------------|:--------------------------------------------------|:-----------------|
229
+ | <code>개념적으로 크림 스키밍은 제품과 지리라는 두 가지 기본 차원을 가지고 있다.</code> | <code>제품과 지리학은 크림 스키밍을 작동시키는 것이다.</code> | <code>0.5</code> |
230
+ | <code>시즌 중에 알고 있는 거 알아? 네 레벨에서 다음 레벨로 잃어버리는 거야 브레이브스가 모팀을 떠올리기로 결정하면 브레이브스가 트리플 A에서 한 남자를 떠올리기로 결정하면 더블 A가 그를 대신하러 올라가고 A 한 명이 그를 대신하러 올라간다.</code> | <code>사람들이 기억하면 다음 수준으로 물건을 잃는다.</code> | <code>1.0</code> |
231
+ | <code>우리 번호 중 하나가 당신의 지시를 세밀하게 수행할 것이다.</code> | <code>우리 팀의 일원이 당신의 명령을 엄청나게 정확하게 실행할 것이다.</code> | <code>1.0</code> |
232
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
233
+ ```json
234
+ {
235
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
236
+ }
237
+ ```
238
+
239
+ ### Evaluation Dataset
240
+
241
+ #### sts_dev
242
+
243
+ * Dataset: [sts_dev](https://huggingface.co/datasets/CocoRoF/sts_dev) at [1de0cdf](https://huggingface.co/datasets/CocoRoF/sts_dev/tree/1de0cdfb2c238786ee61c5765aa60eed4a782371)
244
+ * Size: 1,500 evaluation samples
245
+ * Columns: <code>text</code>, <code>pair</code>, and <code>label</code>
246
+ * Approximate statistics based on the first 1000 samples:
247
+ | | text | pair | label |
248
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------|
249
+ | type | string | string | float |
250
+ | details | <ul><li>min: 7 tokens</li><li>mean: 20.38 tokens</li><li>max: 52 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 20.52 tokens</li><li>max: 54 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.42</li><li>max: 1.0</li></ul> |
251
+ * Samples:
252
+ | text | pair | label |
253
+ |:-------------------------------------|:------------------------------------|:------------------|
254
+ | <code>안전모를 가진 한 남자가 춤을 추고 있다.</code> | <code>안전모를 쓴 한 남자가 춤을 추고 있다.</code> | <code>1.0</code> |
255
+ | <code>어린아이가 말을 타고 있다.</code> | <code>아이가 말을 타고 있다.</code> | <code>0.95</code> |
256
+ | <code>한 남자가 뱀에게 쥐를 먹이고 있다.</code> | <code>남자가 뱀에게 쥐를 먹이고 있다.</code> | <code>1.0</code> |
257
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
258
+ ```json
259
+ {
260
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
261
+ }
262
+ ```
263
+
264
+ ### Framework Versions
265
+ - Python: 3.11.10
266
+ - Sentence Transformers: 3.3.1
267
+ - Transformers: 4.48.0
268
+ - PyTorch: 2.5.1+cu124
269
+ - Accelerate: 1.2.1
270
+ - Datasets: 3.2.0
271
+ - Tokenizers: 0.21.0
272
+
273
+ ## Citation
274
+
275
+ ### BibTeX
276
+
277
+ #### Sentence Transformers
278
+ ```bibtex
279
+ @inproceedings{reimers-2019-sentence-bert,
280
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
281
+ author = "Reimers, Nils and Gurevych, Iryna",
282
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
283
+ month = "11",
284
+ year = "2019",
285
+ publisher = "Association for Computational Linguistics",
286
+ url = "https://arxiv.org/abs/1908.10084",
287
+ }
288
+ ```
289
+
290
+ <!--
291
+ ## Glossary
292
+
293
+ *Clearly define terms in order to be accessible across audiences.*
294
+ -->
295
+
296
+ <!--
297
+ ## Model Card Authors
298
+
299
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
300
+ -->
301
+
302
+ <!--
303
  ## Model Card Contact
304
 
305
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
306
+ -->