ayoubkirouane commited on
Commit
ed20068
1 Parent(s): 1810a4c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -484
README.md CHANGED
@@ -10,489 +10,5 @@ tags:
10
  - dataset_size:100K<n<1M
11
  - loss:MultipleNegativesRankingLoss
12
  base_model: microsoft/mpnet-base
13
- metrics:
14
- - cosine_accuracy
15
- - dot_accuracy
16
- - manhattan_accuracy
17
- - euclidean_accuracy
18
- - max_accuracy
19
- widget:
20
- - source_sentence: A woman sings.
21
- sentences:
22
- - The woman is singing.
23
- - A woman sits outside.
24
- - Two men are sleeping.
25
- - source_sentence: The boy scowls
26
- sentences:
27
- - An insecure boy.
28
- - A person is climbing.
29
- - Two women are sleeping.
30
- - source_sentence: There's a dock
31
- sentences:
32
- - There is people outside.
33
- - two women are outside
34
- - a boy sleeps on the couch
35
- - source_sentence: A bird flying.
36
- sentences:
37
- - an eagle flies
38
- - The girl is outdoors.
39
- - Two men are sleeping.
40
- - source_sentence: a baby smiling
41
- sentences:
42
- - The boy is smiling
43
- - The girl is standing.
44
- - Two men are in a kayak.
45
- pipeline_tag: sentence-similarity
46
- model-index:
47
- - name: MPNet base trained on AllNLI triplets
48
- results:
49
- - task:
50
- type: triplet
51
- name: Triplet
52
- dataset:
53
- name: all nli dev
54
- type: all-nli-dev
55
- metrics:
56
- - type: cosine_accuracy
57
- value: 0.85
58
- name: Cosine Accuracy
59
- - type: dot_accuracy
60
- value: 0.155
61
- name: Dot Accuracy
62
- - type: manhattan_accuracy
63
- value: 0.848
64
- name: Manhattan Accuracy
65
- - type: euclidean_accuracy
66
- value: 0.846
67
- name: Euclidean Accuracy
68
- - type: max_accuracy
69
- value: 0.85
70
- name: Max Accuracy
71
- - task:
72
- type: triplet
73
- name: Triplet
74
- dataset:
75
- name: all nli test
76
- type: all-nli-test
77
- metrics:
78
- - type: cosine_accuracy
79
- value: 0.946
80
- name: Cosine Accuracy
81
- - type: dot_accuracy
82
- value: 0.05
83
- name: Dot Accuracy
84
- - type: manhattan_accuracy
85
- value: 0.942
86
- name: Manhattan Accuracy
87
- - type: euclidean_accuracy
88
- value: 0.942
89
- name: Euclidean Accuracy
90
- - type: max_accuracy
91
- value: 0.946
92
- name: Max Accuracy
93
  ---
94
 
95
- # MPNet base trained on AllNLI triplets
96
-
97
- This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) on the [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
98
-
99
- ## Model Details
100
-
101
- ### Model Description
102
- - **Model Type:** Sentence Transformer
103
- - **Base model:** [microsoft/mpnet-base](https://huggingface.co/microsoft/mpnet-base) <!-- at revision 6996ce1e91bd2a9c7d7f61daec37463394f73f09 -->
104
- - **Maximum Sequence Length:** 512 tokens
105
- - **Output Dimensionality:** 768 tokens
106
- - **Similarity Function:** Cosine Similarity
107
- - **Training Dataset:**
108
- - [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli)
109
- - **Language:** en
110
- - **License:** apache-2.0
111
-
112
- ### Model Sources
113
-
114
- - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
115
- - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
116
- - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
117
-
118
- ### Full Model Architecture
119
-
120
- ```
121
- SentenceTransformer(
122
- (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
123
- (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
124
- )
125
- ```
126
-
127
- ## Usage
128
-
129
- ### Direct Usage (Sentence Transformers)
130
-
131
- First install the Sentence Transformers library:
132
-
133
- ```bash
134
- pip install -U sentence-transformers
135
- ```
136
-
137
- Then you can load this model and run inference.
138
- ```python
139
- from sentence_transformers import SentenceTransformer
140
-
141
- # Download from the 🤗 Hub
142
- model = SentenceTransformer("ayoubkirouane/Mpnet-base-ALL-NLI")
143
- # Run inference
144
- sentences = [
145
- 'a baby smiling',
146
- 'The boy is smiling',
147
- 'The girl is standing.',
148
- ]
149
- embeddings = model.encode(sentences)
150
- print(embeddings.shape)
151
- # [3, 768]
152
-
153
- # Get the similarity scores for the embeddings
154
- similarities = model.similarity(embeddings, embeddings)
155
- print(similarities.shape)
156
- # [3, 3]
157
- ```
158
-
159
- <!--
160
- ### Direct Usage (Transformers)
161
-
162
- <details><summary>Click to see the direct usage in Transformers</summary>
163
-
164
- </details>
165
- -->
166
-
167
- <!--
168
- ### Downstream Usage (Sentence Transformers)
169
-
170
- You can finetune this model on your own dataset.
171
-
172
- <details><summary>Click to expand</summary>
173
-
174
- </details>
175
- -->
176
-
177
- <!--
178
- ### Out-of-Scope Use
179
-
180
- *List how the model may foreseeably be misused and address what users ought not to do with the model.*
181
- -->
182
-
183
- ## Evaluation
184
-
185
- ### Metrics
186
-
187
- #### Triplet
188
- * Dataset: `all-nli-dev`
189
- * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
190
-
191
- | Metric | Value |
192
- |:-------------------|:---------|
193
- | cosine_accuracy | 0.85 |
194
- | dot_accuracy | 0.155 |
195
- | manhattan_accuracy | 0.848 |
196
- | euclidean_accuracy | 0.846 |
197
- | **max_accuracy** | **0.85** |
198
-
199
- #### Triplet
200
- * Dataset: `all-nli-test`
201
- * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
202
-
203
- | Metric | Value |
204
- |:-------------------|:----------|
205
- | cosine_accuracy | 0.946 |
206
- | dot_accuracy | 0.05 |
207
- | manhattan_accuracy | 0.942 |
208
- | euclidean_accuracy | 0.942 |
209
- | **max_accuracy** | **0.946** |
210
-
211
- <!--
212
- ## Bias, Risks and Limitations
213
-
214
- *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
215
- -->
216
-
217
- <!--
218
- ### Recommendations
219
-
220
- *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
221
- -->
222
-
223
- ## Training Details
224
-
225
- ### Training Dataset
226
-
227
- #### sentence-transformers/all-nli
228
-
229
- * Dataset: [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
230
- * Size: 100,000 training samples
231
- * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
232
- * Approximate statistics based on the first 1000 samples:
233
- | | anchor | positive | negative |
234
- |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|
235
- | type | string | string | string |
236
- | details | <ul><li>min: 7 tokens</li><li>mean: 10.46 tokens</li><li>max: 46 tokens</li></ul> | <ul><li>min: 6 tokens</li><li>mean: 12.81 tokens</li><li>max: 40 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 13.4 tokens</li><li>max: 50 tokens</li></ul> |
237
- * Samples:
238
- | anchor | positive | negative |
239
- |:---------------------------------------------------------------------------|:-------------------------------------------------|:-----------------------------------------------------------|
240
- | <code>A person on a horse jumps over a broken down airplane.</code> | <code>A person is outdoors, on a horse.</code> | <code>A person is at a diner, ordering an omelette.</code> |
241
- | <code>Children smiling and waving at camera</code> | <code>There are children present</code> | <code>The kids are frowning</code> |
242
- | <code>A boy is jumping on skateboard in the middle of a red bridge.</code> | <code>The boy does a skateboarding trick.</code> | <code>The boy skates down the sidewalk.</code> |
243
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
244
- ```json
245
- {
246
- "scale": 20.0,
247
- "similarity_fct": "cos_sim"
248
- }
249
- ```
250
-
251
- ### Evaluation Dataset
252
-
253
- #### sentence-transformers/all-nli
254
-
255
- * Dataset: [sentence-transformers/all-nli](https://huggingface.co/datasets/sentence-transformers/all-nli) at [d482672](https://huggingface.co/datasets/sentence-transformers/all-nli/tree/d482672c8e74ce18da116f430137434ba2e52fab)
256
- * Size: 1,000 evaluation samples
257
- * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
258
- * Approximate statistics based on the first 1000 samples:
259
- | | anchor | positive | negative |
260
- |:--------|:----------------------------------------------------------------------------------|:---------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
261
- | type | string | string | string |
262
- | details | <ul><li>min: 6 tokens</li><li>mean: 17.95 tokens</li><li>max: 63 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 9.78 tokens</li><li>max: 29 tokens</li></ul> | <ul><li>min: 5 tokens</li><li>mean: 10.35 tokens</li><li>max: 29 tokens</li></ul> |
263
- * Samples:
264
- | anchor | positive | negative |
265
- |:-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------|:--------------------------------------------------------|
266
- | <code>Two women are embracing while holding to go packages.</code> | <code>Two woman are holding packages.</code> | <code>The men are fighting outside a deli.</code> |
267
- | <code>Two young children in blue jerseys, one with the number 9 and one with the number 2 are standing on wooden steps in a bathroom and washing their hands in a sink.</code> | <code>Two kids in numbered jerseys wash their hands.</code> | <code>Two kids in jackets walk to school.</code> |
268
- | <code>A man selling donuts to a customer during a world exhibition event held in the city of Angeles</code> | <code>A man selling donuts to a customer.</code> | <code>A woman drinks her coffee in a small cafe.</code> |
269
- * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
270
- ```json
271
- {
272
- "scale": 20.0,
273
- "similarity_fct": "cos_sim"
274
- }
275
- ```
276
-
277
- ### Training Hyperparameters
278
- #### Non-Default Hyperparameters
279
-
280
- - `eval_strategy`: steps
281
- - `per_device_train_batch_size`: 16
282
- - `per_device_eval_batch_size`: 16
283
- - `num_train_epochs`: 1
284
- - `warmup_ratio`: 0.1
285
- - `fp16`: True
286
- - `batch_sampler`: no_duplicates
287
-
288
- #### All Hyperparameters
289
- <details><summary>Click to expand</summary>
290
-
291
- - `overwrite_output_dir`: False
292
- - `do_predict`: False
293
- - `eval_strategy`: steps
294
- - `prediction_loss_only`: True
295
- - `per_device_train_batch_size`: 16
296
- - `per_device_eval_batch_size`: 16
297
- - `per_gpu_train_batch_size`: None
298
- - `per_gpu_eval_batch_size`: None
299
- - `gradient_accumulation_steps`: 1
300
- - `eval_accumulation_steps`: None
301
- - `learning_rate`: 5e-05
302
- - `weight_decay`: 0.0
303
- - `adam_beta1`: 0.9
304
- - `adam_beta2`: 0.999
305
- - `adam_epsilon`: 1e-08
306
- - `max_grad_norm`: 1.0
307
- - `num_train_epochs`: 1
308
- - `max_steps`: -1
309
- - `lr_scheduler_type`: linear
310
- - `lr_scheduler_kwargs`: {}
311
- - `warmup_ratio`: 0.1
312
- - `warmup_steps`: 0
313
- - `log_level`: passive
314
- - `log_level_replica`: warning
315
- - `log_on_each_node`: True
316
- - `logging_nan_inf_filter`: True
317
- - `save_safetensors`: True
318
- - `save_on_each_node`: False
319
- - `save_only_model`: False
320
- - `restore_callback_states_from_checkpoint`: False
321
- - `no_cuda`: False
322
- - `use_cpu`: False
323
- - `use_mps_device`: False
324
- - `seed`: 42
325
- - `data_seed`: None
326
- - `jit_mode_eval`: False
327
- - `use_ipex`: False
328
- - `bf16`: False
329
- - `fp16`: True
330
- - `fp16_opt_level`: O1
331
- - `half_precision_backend`: auto
332
- - `bf16_full_eval`: False
333
- - `fp16_full_eval`: False
334
- - `tf32`: None
335
- - `local_rank`: 0
336
- - `ddp_backend`: None
337
- - `tpu_num_cores`: None
338
- - `tpu_metrics_debug`: False
339
- - `debug`: []
340
- - `dataloader_drop_last`: False
341
- - `dataloader_num_workers`: 0
342
- - `dataloader_prefetch_factor`: None
343
- - `past_index`: -1
344
- - `disable_tqdm`: False
345
- - `remove_unused_columns`: True
346
- - `label_names`: None
347
- - `load_best_model_at_end`: False
348
- - `ignore_data_skip`: False
349
- - `fsdp`: []
350
- - `fsdp_min_num_params`: 0
351
- - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
352
- - `fsdp_transformer_layer_cls_to_wrap`: None
353
- - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
354
- - `deepspeed`: None
355
- - `label_smoothing_factor`: 0.0
356
- - `optim`: adamw_torch
357
- - `optim_args`: None
358
- - `adafactor`: False
359
- - `group_by_length`: False
360
- - `length_column_name`: length
361
- - `ddp_find_unused_parameters`: None
362
- - `ddp_bucket_cap_mb`: None
363
- - `ddp_broadcast_buffers`: False
364
- - `dataloader_pin_memory`: True
365
- - `dataloader_persistent_workers`: False
366
- - `skip_memory_metrics`: True
367
- - `use_legacy_prediction_loop`: False
368
- - `push_to_hub`: False
369
- - `resume_from_checkpoint`: None
370
- - `hub_model_id`: None
371
- - `hub_strategy`: every_save
372
- - `hub_private_repo`: False
373
- - `hub_always_push`: False
374
- - `gradient_checkpointing`: False
375
- - `gradient_checkpointing_kwargs`: None
376
- - `include_inputs_for_metrics`: False
377
- - `eval_do_concat_batches`: True
378
- - `fp16_backend`: auto
379
- - `push_to_hub_model_id`: None
380
- - `push_to_hub_organization`: None
381
- - `mp_parameters`:
382
- - `auto_find_batch_size`: False
383
- - `full_determinism`: False
384
- - `torchdynamo`: None
385
- - `ray_scope`: last
386
- - `ddp_timeout`: 1800
387
- - `torch_compile`: False
388
- - `torch_compile_backend`: None
389
- - `torch_compile_mode`: None
390
- - `dispatch_batches`: None
391
- - `split_batches`: None
392
- - `include_tokens_per_second`: False
393
- - `include_num_input_tokens_seen`: False
394
- - `neftune_noise_alpha`: None
395
- - `optim_target_modules`: None
396
- - `batch_eval_metrics`: False
397
- - `batch_sampler`: no_duplicates
398
- - `multi_dataset_batch_sampler`: proportional
399
-
400
- </details>
401
-
402
- ### Training Logs
403
- | Epoch | Step | Training Loss | loss | all-nli-dev_max_accuracy | all-nli-test_max_accuracy |
404
- |:-----:|:----:|:-------------:|:------:|:------------------------:|:-------------------------:|
405
- | 0 | 0 | - | - | 0.636 | - |
406
- | 0.032 | 100 | 2.6736 | 0.8660 | 0.881 | - |
407
- | 0.064 | 200 | 1.0541 | 0.9318 | 0.866 | - |
408
- | 0.096 | 300 | 1.1691 | 1.0155 | 0.876 | - |
409
- | 0.128 | 400 | 1.2233 | 1.3754 | 0.85 | - |
410
- | 0.032 | 100 | 1.5484 | 0.9666 | - | - |
411
- | 0.064 | 200 | 0.5988 | 0.8912 | - | - |
412
- | 0.096 | 300 | 0.4046 | 1.0413 | - | - |
413
- | 0.128 | 400 | 0.2979 | 1.2470 | - | - |
414
- | 0.16 | 500 | 1.1653 | 1.2219 | - | - |
415
- | 0.192 | 600 | 1.1348 | 1.1751 | - | - |
416
- | 0.224 | 700 | 1.2606 | 1.2407 | - | - |
417
- | 0.256 | 800 | 1.083 | 1.1729 | - | - |
418
- | 0.288 | 900 | 1.0435 | 1.1577 | - | - |
419
- | 0.32 | 1000 | 0.9209 | 1.0593 | - | - |
420
- | 0.352 | 1100 | 1.0499 | 1.0049 | - | - |
421
- | 0.384 | 1200 | 1.194 | 1.1318 | - | - |
422
- | 0.416 | 1300 | 1.2979 | 1.0062 | - | - |
423
- | 0.448 | 1400 | 1.2356 | 1.0485 | - | - |
424
- | 0.48 | 1500 | 1.0414 | 0.8570 | - | - |
425
- | 0.512 | 1600 | 0.8688 | 0.8401 | - | - |
426
- | 0.544 | 1700 | 0.8349 | 0.7505 | - | - |
427
- | 0.576 | 1800 | 0.8965 | 0.7833 | - | - |
428
- | 0.608 | 1900 | 0.9347 | 0.7959 | - | - |
429
- | 0.64 | 2000 | 1.0194 | 0.6819 | - | - |
430
- | 0.672 | 2100 | 0.928 | 0.6060 | - | - |
431
- | 0.704 | 2200 | 0.9087 | 0.5785 | - | - |
432
- | 0.736 | 2300 | 0.8015 | 0.5598 | - | - |
433
- | 0.768 | 2400 | 0.7945 | 0.5644 | - | - |
434
- | 0.8 | 2500 | 0.8071 | 0.5606 | - | - |
435
- | 0.832 | 2600 | 0.7321 | 0.5724 | - | - |
436
- | 0.864 | 2700 | 0.7732 | 0.5478 | - | - |
437
- | 0.896 | 2800 | 0.8436 | 0.5054 | - | - |
438
- | 0.928 | 2900 | 0.9542 | 0.4962 | - | - |
439
- | 0.96 | 3000 | 0.6193 | 0.5048 | - | - |
440
- | 0.992 | 3100 | 0.0198 | 0.5503 | - | - |
441
- | 1.0 | 3125 | - | - | - | 0.946 |
442
-
443
-
444
- ### Framework Versions
445
- - Python: 3.10.13
446
- - Sentence Transformers: 3.0.0
447
- - Transformers: 4.41.1
448
- - PyTorch: 2.1.2
449
- - Accelerate: 0.30.1
450
- - Datasets: 2.19.1
451
- - Tokenizers: 0.19.1
452
-
453
- ## Citation
454
-
455
- ### BibTeX
456
-
457
- #### Sentence Transformers
458
- ```bibtex
459
- @inproceedings{reimers-2019-sentence-bert,
460
- title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
461
- author = "Reimers, Nils and Gurevych, Iryna",
462
- booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
463
- month = "11",
464
- year = "2019",
465
- publisher = "Association for Computational Linguistics",
466
- url = "https://arxiv.org/abs/1908.10084",
467
- }
468
- ```
469
-
470
- #### MultipleNegativesRankingLoss
471
- ```bibtex
472
- @misc{henderson2017efficient,
473
- title={Efficient Natural Language Response Suggestion for Smart Reply},
474
- author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
475
- year={2017},
476
- eprint={1705.00652},
477
- archivePrefix={arXiv},
478
- primaryClass={cs.CL}
479
- }
480
- ```
481
-
482
- <!--
483
- ## Glossary
484
-
485
- *Clearly define terms in order to be accessible across audiences.*
486
- -->
487
-
488
- <!--
489
- ## Model Card Authors
490
-
491
- *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
492
- -->
493
-
494
- <!--
495
- ## Model Card Contact
496
-
497
- *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
498
- -->
 
10
  - dataset_size:100K<n<1M
11
  - loss:MultipleNegativesRankingLoss
12
  base_model: microsoft/mpnet-base
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
  ---
14