Nessrine9 commited on
Commit
9f57238
·
verified ·
1 Parent(s): d9e0a37

Finetuned model on SNLI

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,475 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: sentence-transformers/all-MiniLM-L12-v2
3
+ library_name: sentence-transformers
4
+ metrics:
5
+ - pearson_cosine
6
+ - spearman_cosine
7
+ - pearson_manhattan
8
+ - spearman_manhattan
9
+ - pearson_euclidean
10
+ - spearman_euclidean
11
+ - pearson_dot
12
+ - spearman_dot
13
+ - pearson_max
14
+ - spearman_max
15
+ pipeline_tag: sentence-similarity
16
+ tags:
17
+ - sentence-transformers
18
+ - sentence-similarity
19
+ - feature-extraction
20
+ - generated_from_trainer
21
+ - dataset_size:100000
22
+ - loss:CosineSimilarityLoss
23
+ widget:
24
+ - source_sentence: The church has granite statues of Jesus and the Apostles adorning
25
+ its porch .
26
+ sentences:
27
+ - There were no statues in the church .
28
+ - L' Afrique du sud et le reste de l' Afrique sont les mêmes .
29
+ - Tours on foot are a great way to see LA .
30
+ - source_sentence: Au Centre du réseau routier de la région , Alicante est également
31
+ une base logique pour les automobilistes et pour les liaisons ferroviaires et
32
+ ferroviaires .
33
+ sentences:
34
+ - Alicante est fréquentée par les automobilistes et les touristes .
35
+ - Les examinateurs ont passé sept mois à étudier leurs conclusions .
36
+ - Ferries to the island depart from the central station every 2 hours .
37
+ - source_sentence: Scheduled to reopen in 2002 or 2003 , the Malibu site will house
38
+ only the Getty holdings in Greek and Roman antiquities , some of which date as
39
+ far back as 3000 b.c.
40
+ sentences:
41
+ - C' est impossible d' avoir des billets pour les enregistrements télévisés .
42
+ - The Getty holdings were taken hold of thanks to the researchers ' effort .
43
+ - After the first of may ends the peak season for ferries .
44
+ - source_sentence: Une nouvelle recherche relie ces bactéries parodontale aux maladies
45
+ cardiaques , au diabète , aux bébés à faible poids de naissance , et à d' autres
46
+ saletés que vous attendez des bactéries qui se déchaînent dans le sang .
47
+ sentences:
48
+ - Le prix des actions de Caterpillar a baissé en 1991 quand ils ont fait grève .
49
+ - Ils agissent comme chaque année est la même .
50
+ - La recherche indique qu' il n' y a pas de lien entre les bactéries parodontale
51
+ et les maladies cardiaques ou le diabète .
52
+ - source_sentence: L' ancien n' est pas une classification juridique qui entraîne
53
+ une perte automatique de ces droits .
54
+ sentences:
55
+ - Some degree of uncertainty is inherent in free-market systems .
56
+ - Les villes grecques d' Anatolie ont été exclues de l' appartenance à la Confédération
57
+ Delian .
58
+ - Ils voulaient plaider pour les personnes âgées .
59
+ model-index:
60
+ - name: SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2
61
+ results:
62
+ - task:
63
+ type: semantic-similarity
64
+ name: Semantic Similarity
65
+ dataset:
66
+ name: snli dev
67
+ type: snli-dev
68
+ metrics:
69
+ - type: pearson_cosine
70
+ value: 0.35421287329686374
71
+ name: Pearson Cosine
72
+ - type: spearman_cosine
73
+ value: 0.3592670991851331
74
+ name: Spearman Cosine
75
+ - type: pearson_manhattan
76
+ value: 0.34936411192844985
77
+ name: Pearson Manhattan
78
+ - type: spearman_manhattan
79
+ value: 0.3583327923327215
80
+ name: Spearman Manhattan
81
+ - type: pearson_euclidean
82
+ value: 0.34982920048695176
83
+ name: Pearson Euclidean
84
+ - type: spearman_euclidean
85
+ value: 0.35926709915022625
86
+ name: Spearman Euclidean
87
+ - type: pearson_dot
88
+ value: 0.3542128787197555
89
+ name: Pearson Dot
90
+ - type: spearman_dot
91
+ value: 0.35926727022169175
92
+ name: Spearman Dot
93
+ - type: pearson_max
94
+ value: 0.3542128787197555
95
+ name: Pearson Max
96
+ - type: spearman_max
97
+ value: 0.35926727022169175
98
+ name: Spearman Max
99
+ ---
100
+
101
+ # SentenceTransformer based on sentence-transformers/all-MiniLM-L12-v2
102
+
103
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
104
+
105
+ ## Model Details
106
+
107
+ ### Model Description
108
+ - **Model Type:** Sentence Transformer
109
+ - **Base model:** [sentence-transformers/all-MiniLM-L12-v2](https://huggingface.co/sentence-transformers/all-MiniLM-L12-v2) <!-- at revision 30ce63ae64e71b9199b3d2eae9de99f64a26eedc -->
110
+ - **Maximum Sequence Length:** 128 tokens
111
+ - **Output Dimensionality:** 384 tokens
112
+ - **Similarity Function:** Cosine Similarity
113
+ <!-- - **Training Dataset:** Unknown -->
114
+ <!-- - **Language:** Unknown -->
115
+ <!-- - **License:** Unknown -->
116
+
117
+ ### Model Sources
118
+
119
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
120
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
121
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
122
+
123
+ ### Full Model Architecture
124
+
125
+ ```
126
+ SentenceTransformer(
127
+ (0): Transformer({'max_seq_length': 128, 'do_lower_case': False}) with Transformer model: BertModel
128
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
129
+ (2): Normalize()
130
+ )
131
+ ```
132
+
133
+ ## Usage
134
+
135
+ ### Direct Usage (Sentence Transformers)
136
+
137
+ First install the Sentence Transformers library:
138
+
139
+ ```bash
140
+ pip install -U sentence-transformers
141
+ ```
142
+
143
+ Then you can load this model and run inference.
144
+ ```python
145
+ from sentence_transformers import SentenceTransformer
146
+
147
+ # Download from the 🤗 Hub
148
+ model = SentenceTransformer("Nessrine9/finetuned-snli-MiniLM-L12-v2-100k-en-fr")
149
+ # Run inference
150
+ sentences = [
151
+ "L' ancien n' est pas une classification juridique qui entraîne une perte automatique de ces droits .",
152
+ 'Ils voulaient plaider pour les personnes âgées .',
153
+ "Les villes grecques d' Anatolie ont été exclues de l' appartenance à la Confédération Delian .",
154
+ ]
155
+ embeddings = model.encode(sentences)
156
+ print(embeddings.shape)
157
+ # [3, 384]
158
+
159
+ # Get the similarity scores for the embeddings
160
+ similarities = model.similarity(embeddings, embeddings)
161
+ print(similarities.shape)
162
+ # [3, 3]
163
+ ```
164
+
165
+ <!--
166
+ ### Direct Usage (Transformers)
167
+
168
+ <details><summary>Click to see the direct usage in Transformers</summary>
169
+
170
+ </details>
171
+ -->
172
+
173
+ <!--
174
+ ### Downstream Usage (Sentence Transformers)
175
+
176
+ You can finetune this model on your own dataset.
177
+
178
+ <details><summary>Click to expand</summary>
179
+
180
+ </details>
181
+ -->
182
+
183
+ <!--
184
+ ### Out-of-Scope Use
185
+
186
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
187
+ -->
188
+
189
+ ## Evaluation
190
+
191
+ ### Metrics
192
+
193
+ #### Semantic Similarity
194
+ * Dataset: `snli-dev`
195
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
196
+
197
+ | Metric | Value |
198
+ |:-------------------|:-----------|
199
+ | pearson_cosine | 0.3542 |
200
+ | spearman_cosine | 0.3593 |
201
+ | pearson_manhattan | 0.3494 |
202
+ | spearman_manhattan | 0.3583 |
203
+ | pearson_euclidean | 0.3498 |
204
+ | spearman_euclidean | 0.3593 |
205
+ | pearson_dot | 0.3542 |
206
+ | spearman_dot | 0.3593 |
207
+ | pearson_max | 0.3542 |
208
+ | **spearman_max** | **0.3593** |
209
+
210
+ <!--
211
+ ## Bias, Risks and Limitations
212
+
213
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
214
+ -->
215
+
216
+ <!--
217
+ ### Recommendations
218
+
219
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
220
+ -->
221
+
222
+ ## Training Details
223
+
224
+ ### Training Dataset
225
+
226
+ #### Unnamed Dataset
227
+
228
+
229
+ * Size: 100,000 training samples
230
+ * Columns: <code>sentence_0</code>, <code>sentence_1</code>, and <code>label</code>
231
+ * Approximate statistics based on the first 1000 samples:
232
+ | | sentence_0 | sentence_1 | label |
233
+ |:--------|:-----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:--------------------------------------------------------------|
234
+ | type | string | string | float |
235
+ | details | <ul><li>min: 5 tokens</li><li>mean: 34.31 tokens</li><li>max: 128 tokens</li></ul> | <ul><li>min: 3 tokens</li><li>mean: 18.24 tokens</li><li>max: 51 tokens</li></ul> | <ul><li>min: 0.0</li><li>mean: 0.5</li><li>max: 1.0</li></ul> |
236
+ * Samples:
237
+ | sentence_0 | sentence_1 | label |
238
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------|:-----------------|
239
+ | <code>We 're off ! "</code> | <code>We 're not headed off .</code> | <code>1.0</code> |
240
+ | <code>Il y en a eu un ici récemment qui me vient à l' esprit que c' est à propos d' une femme que c' est ridicule je veux dire que c' est presque euh ce serait drôle si ce n' était pas si triste je veux dire cette femme cette femme est sortie et a engagé quelqu' un à</code> | <code>Cette femme a engagé quelqu' un récemment pour le faire et s' est fait prendre immédiatement .</code> | <code>0.5</code> |
241
+ | <code>Gentilello a précisé qu' il n' avait pas critiqué le processus d' examen par les pairs , mais que les panels qui examinent les interventions en matière d' alcool dans l' eds devraient inclure des représentants de la médecine d' urgence .</code> | <code>Gentilello S' est ensuite battu avec un psychiatre sur le parking .</code> | <code>0.5</code> |
242
+ * Loss: [<code>CosineSimilarityLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#cosinesimilarityloss) with these parameters:
243
+ ```json
244
+ {
245
+ "loss_fct": "torch.nn.modules.loss.MSELoss"
246
+ }
247
+ ```
248
+
249
+ ### Training Hyperparameters
250
+ #### Non-Default Hyperparameters
251
+
252
+ - `eval_strategy`: steps
253
+ - `per_device_train_batch_size`: 16
254
+ - `per_device_eval_batch_size`: 16
255
+ - `num_train_epochs`: 4
256
+ - `fp16`: True
257
+ - `multi_dataset_batch_sampler`: round_robin
258
+
259
+ #### All Hyperparameters
260
+ <details><summary>Click to expand</summary>
261
+
262
+ - `overwrite_output_dir`: False
263
+ - `do_predict`: False
264
+ - `eval_strategy`: steps
265
+ - `prediction_loss_only`: True
266
+ - `per_device_train_batch_size`: 16
267
+ - `per_device_eval_batch_size`: 16
268
+ - `per_gpu_train_batch_size`: None
269
+ - `per_gpu_eval_batch_size`: None
270
+ - `gradient_accumulation_steps`: 1
271
+ - `eval_accumulation_steps`: None
272
+ - `torch_empty_cache_steps`: None
273
+ - `learning_rate`: 5e-05
274
+ - `weight_decay`: 0.0
275
+ - `adam_beta1`: 0.9
276
+ - `adam_beta2`: 0.999
277
+ - `adam_epsilon`: 1e-08
278
+ - `max_grad_norm`: 1
279
+ - `num_train_epochs`: 4
280
+ - `max_steps`: -1
281
+ - `lr_scheduler_type`: linear
282
+ - `lr_scheduler_kwargs`: {}
283
+ - `warmup_ratio`: 0.0
284
+ - `warmup_steps`: 0
285
+ - `log_level`: passive
286
+ - `log_level_replica`: warning
287
+ - `log_on_each_node`: True
288
+ - `logging_nan_inf_filter`: True
289
+ - `save_safetensors`: True
290
+ - `save_on_each_node`: False
291
+ - `save_only_model`: False
292
+ - `restore_callback_states_from_checkpoint`: False
293
+ - `no_cuda`: False
294
+ - `use_cpu`: False
295
+ - `use_mps_device`: False
296
+ - `seed`: 42
297
+ - `data_seed`: None
298
+ - `jit_mode_eval`: False
299
+ - `use_ipex`: False
300
+ - `bf16`: False
301
+ - `fp16`: True
302
+ - `fp16_opt_level`: O1
303
+ - `half_precision_backend`: auto
304
+ - `bf16_full_eval`: False
305
+ - `fp16_full_eval`: False
306
+ - `tf32`: None
307
+ - `local_rank`: 0
308
+ - `ddp_backend`: None
309
+ - `tpu_num_cores`: None
310
+ - `tpu_metrics_debug`: False
311
+ - `debug`: []
312
+ - `dataloader_drop_last`: False
313
+ - `dataloader_num_workers`: 0
314
+ - `dataloader_prefetch_factor`: None
315
+ - `past_index`: -1
316
+ - `disable_tqdm`: False
317
+ - `remove_unused_columns`: True
318
+ - `label_names`: None
319
+ - `load_best_model_at_end`: False
320
+ - `ignore_data_skip`: False
321
+ - `fsdp`: []
322
+ - `fsdp_min_num_params`: 0
323
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
324
+ - `fsdp_transformer_layer_cls_to_wrap`: None
325
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
326
+ - `deepspeed`: None
327
+ - `label_smoothing_factor`: 0.0
328
+ - `optim`: adamw_torch
329
+ - `optim_args`: None
330
+ - `adafactor`: False
331
+ - `group_by_length`: False
332
+ - `length_column_name`: length
333
+ - `ddp_find_unused_parameters`: None
334
+ - `ddp_bucket_cap_mb`: None
335
+ - `ddp_broadcast_buffers`: False
336
+ - `dataloader_pin_memory`: True
337
+ - `dataloader_persistent_workers`: False
338
+ - `skip_memory_metrics`: True
339
+ - `use_legacy_prediction_loop`: False
340
+ - `push_to_hub`: False
341
+ - `resume_from_checkpoint`: None
342
+ - `hub_model_id`: None
343
+ - `hub_strategy`: every_save
344
+ - `hub_private_repo`: False
345
+ - `hub_always_push`: False
346
+ - `gradient_checkpointing`: False
347
+ - `gradient_checkpointing_kwargs`: None
348
+ - `include_inputs_for_metrics`: False
349
+ - `eval_do_concat_batches`: True
350
+ - `fp16_backend`: auto
351
+ - `push_to_hub_model_id`: None
352
+ - `push_to_hub_organization`: None
353
+ - `mp_parameters`:
354
+ - `auto_find_batch_size`: False
355
+ - `full_determinism`: False
356
+ - `torchdynamo`: None
357
+ - `ray_scope`: last
358
+ - `ddp_timeout`: 1800
359
+ - `torch_compile`: False
360
+ - `torch_compile_backend`: None
361
+ - `torch_compile_mode`: None
362
+ - `dispatch_batches`: None
363
+ - `split_batches`: None
364
+ - `include_tokens_per_second`: False
365
+ - `include_num_input_tokens_seen`: False
366
+ - `neftune_noise_alpha`: None
367
+ - `optim_target_modules`: None
368
+ - `batch_eval_metrics`: False
369
+ - `eval_on_start`: False
370
+ - `eval_use_gather_object`: False
371
+ - `batch_sampler`: batch_sampler
372
+ - `multi_dataset_batch_sampler`: round_robin
373
+
374
+ </details>
375
+
376
+ ### Training Logs
377
+ | Epoch | Step | Training Loss | snli-dev_spearman_max |
378
+ |:------:|:-----:|:-------------:|:---------------------:|
379
+ | 0.08 | 500 | 0.1948 | 0.0484 |
380
+ | 0.16 | 1000 | 0.1752 | 0.1177 |
381
+ | 0.24 | 1500 | 0.1727 | 0.1136 |
382
+ | 0.32 | 2000 | 0.1668 | 0.2050 |
383
+ | 0.4 | 2500 | 0.1673 | 0.2227 |
384
+ | 0.48 | 3000 | 0.1651 | 0.1760 |
385
+ | 0.56 | 3500 | 0.1619 | 0.2195 |
386
+ | 0.64 | 4000 | 0.1625 | 0.2308 |
387
+ | 0.72 | 4500 | 0.1563 | 0.2405 |
388
+ | 0.8 | 5000 | 0.1598 | 0.2773 |
389
+ | 0.88 | 5500 | 0.1589 | 0.2359 |
390
+ | 0.96 | 6000 | 0.1587 | 0.2084 |
391
+ | 1.0 | 6250 | - | 0.2615 |
392
+ | 1.04 | 6500 | 0.158 | 0.2958 |
393
+ | 1.12 | 7000 | 0.1557 | 0.2887 |
394
+ | 1.2 | 7500 | 0.1544 | 0.2960 |
395
+ | 1.28 | 8000 | 0.1535 | 0.2977 |
396
+ | 1.3600 | 8500 | 0.1559 | 0.2546 |
397
+ | 1.44 | 9000 | 0.1518 | 0.3201 |
398
+ | 1.52 | 9500 | 0.1551 | 0.2894 |
399
+ | 1.6 | 10000 | 0.149 | 0.2981 |
400
+ | 1.6800 | 10500 | 0.152 | 0.3140 |
401
+ | 1.76 | 11000 | 0.1484 | 0.3056 |
402
+ | 1.8400 | 11500 | 0.1497 | 0.3051 |
403
+ | 1.92 | 12000 | 0.1522 | 0.2893 |
404
+ | 2.0 | 12500 | 0.1503 | 0.2944 |
405
+ | 2.08 | 13000 | 0.1496 | 0.3039 |
406
+ | 2.16 | 13500 | 0.1462 | 0.3314 |
407
+ | 2.24 | 14000 | 0.1505 | 0.2470 |
408
+ | 2.32 | 14500 | 0.1457 | 0.3081 |
409
+ | 2.4 | 15000 | 0.1478 | 0.3204 |
410
+ | 2.48 | 15500 | 0.1464 | 0.3248 |
411
+ | 2.56 | 16000 | 0.1442 | 0.3360 |
412
+ | 2.64 | 16500 | 0.1437 | 0.3418 |
413
+ | 2.7200 | 17000 | 0.1416 | 0.3496 |
414
+ | 2.8 | 17500 | 0.1434 | 0.3283 |
415
+ | 2.88 | 18000 | 0.146 | 0.3246 |
416
+ | 2.96 | 18500 | 0.1448 | 0.3352 |
417
+ | 3.0 | 18750 | - | 0.3248 |
418
+ | 3.04 | 19000 | 0.1445 | 0.3394 |
419
+ | 3.12 | 19500 | 0.1423 | 0.3430 |
420
+ | 3.2 | 20000 | 0.1415 | 0.3410 |
421
+ | 3.2800 | 20500 | 0.1411 | 0.3367 |
422
+ | 3.36 | 21000 | 0.1445 | 0.3497 |
423
+ | 3.44 | 21500 | 0.1383 | 0.3640 |
424
+ | 3.52 | 22000 | 0.1408 | 0.3497 |
425
+ | 3.6 | 22500 | 0.1374 | 0.3452 |
426
+ | 3.68 | 23000 | 0.1401 | 0.3519 |
427
+ | 3.76 | 23500 | 0.137 | 0.3582 |
428
+ | 3.84 | 24000 | 0.1393 | 0.3610 |
429
+ | 3.92 | 24500 | 0.1408 | 0.3575 |
430
+ | 4.0 | 25000 | 0.1388 | 0.3593 |
431
+
432
+
433
+ ### Framework Versions
434
+ - Python: 3.10.12
435
+ - Sentence Transformers: 3.2.1
436
+ - Transformers: 4.44.2
437
+ - PyTorch: 2.5.0+cu121
438
+ - Accelerate: 0.34.2
439
+ - Datasets: 3.0.2
440
+ - Tokenizers: 0.19.1
441
+
442
+ ## Citation
443
+
444
+ ### BibTeX
445
+
446
+ #### Sentence Transformers
447
+ ```bibtex
448
+ @inproceedings{reimers-2019-sentence-bert,
449
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
450
+ author = "Reimers, Nils and Gurevych, Iryna",
451
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
452
+ month = "11",
453
+ year = "2019",
454
+ publisher = "Association for Computational Linguistics",
455
+ url = "https://arxiv.org/abs/1908.10084",
456
+ }
457
+ ```
458
+
459
+ <!--
460
+ ## Glossary
461
+
462
+ *Clearly define terms in order to be accessible across audiences.*
463
+ -->
464
+
465
+ <!--
466
+ ## Model Card Authors
467
+
468
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
469
+ -->
470
+
471
+ <!--
472
+ ## Model Card Contact
473
+
474
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
475
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "sentence-transformers/all-MiniLM-L12-v2",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "gradient_checkpointing": false,
9
+ "hidden_act": "gelu",
10
+ "hidden_dropout_prob": 0.1,
11
+ "hidden_size": 384,
12
+ "initializer_range": 0.02,
13
+ "intermediate_size": 1536,
14
+ "layer_norm_eps": 1e-12,
15
+ "max_position_embeddings": 512,
16
+ "model_type": "bert",
17
+ "num_attention_heads": 12,
18
+ "num_hidden_layers": 12,
19
+ "pad_token_id": 0,
20
+ "position_embedding_type": "absolute",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.44.2",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 30522
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.2.1",
4
+ "transformers": "4.44.2",
5
+ "pytorch": "2.5.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:4ee41106a5685eceaf6c5c288b847ec2168e743c8554e8521c0094618524e672
3
+ size 133462128
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 128,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,37 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": {
3
+ "content": "[CLS]",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "mask_token": {
10
+ "content": "[MASK]",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "[PAD]",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "sep_token": {
24
+ "content": "[SEP]",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "unk_token": {
31
+ "content": "[UNK]",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ }
37
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,64 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_basic_tokenize": true,
47
+ "do_lower_case": true,
48
+ "mask_token": "[MASK]",
49
+ "max_length": 128,
50
+ "model_max_length": 128,
51
+ "never_split": null,
52
+ "pad_to_multiple_of": null,
53
+ "pad_token": "[PAD]",
54
+ "pad_token_type_id": 0,
55
+ "padding_side": "right",
56
+ "sep_token": "[SEP]",
57
+ "stride": 0,
58
+ "strip_accents": null,
59
+ "tokenize_chinese_chars": true,
60
+ "tokenizer_class": "BertTokenizer",
61
+ "truncation_side": "right",
62
+ "truncation_strategy": "longest_first",
63
+ "unk_token": "[UNK]"
64
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff