tomaarsen HF staff commited on
Commit
76e127f
1 Parent(s): 7a28294

Add new SentenceTransformer model.

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 768,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,485 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - en
4
+ library_name: sentence-transformers
5
+ tags:
6
+ - sentence-transformers
7
+ - sentence-similarity
8
+ - feature-extraction
9
+ - loss:TripletLoss
10
+ base_model: distilbert/distilbert-base-uncased
11
+ metrics:
12
+ - cosine_accuracy
13
+ - dot_accuracy
14
+ - manhattan_accuracy
15
+ - euclidean_accuracy
16
+ - max_accuracy
17
+ widget:
18
+ - source_sentence: All charts rank the top 100.
19
+ sentences:
20
+ - 'There are two primary charts: Gaon Album Chart and Gaon Digital Chart.'
21
+ - 'Regional Preferente de Cataluña (3): 1999-00, 2002-03, 2008-09.'
22
+ - Kyūsaku was born in Fukuoka city, Fukuoka prefecture as Sugiyama Naoki.
23
+ - source_sentence: Valley of the Giants (2004) .
24
+ sentences:
25
+ - '"That Girl" (by Hayley) (2001) - AUS: No. 53 [REF].'
26
+ - Nuangola Outlet is situated just south of Penobscot Knob [REF].
27
+ - Like Sir John Moore, the Craufurd family originated from Ayrshire.
28
+ - source_sentence: Flanagan is located at [REF].
29
+ sentences:
30
+ - Sharpes is located at (28.441281, -80.761019) [REF].
31
+ - His father was Gallus Jacob Baumgartner, a prominent statesman.
32
+ - He served terms on the city council in 1654, 1660 and 1666.
33
+ - source_sentence: Fox Sports 1 Purple Bel-Air .
34
+ sentences:
35
+ - Victory 93.7 The Victory 93.7 FM-WTKB ATWOOD-MILAN .
36
+ - Greenwood & Batley also made a number of Coke oven locomotives.
37
+ - Oltmans was born into a wealthy family with roots in the Dutch East Indies.
38
+ - source_sentence: 'Points awarded in the final: .'
39
+ sentences:
40
+ - Points awarded in the final:[REF] .
41
+ - Bishop Ludden recently implemented an innovative House Program.
42
+ - Douglas Wheelock was born in Binghamton, New York to Olin and Margaret Wheelock.
43
+ pipeline_tag: sentence-similarity
44
+ co2_eq_emissions:
45
+ emissions: 3.4895934031398
46
+ energy_consumed: 0.008977554535710646
47
+ source: codecarbon
48
+ training_type: fine-tuning
49
+ on_cloud: false
50
+ cpu_model: 13th Gen Intel(R) Core(TM) i7-13700K
51
+ ram_total_size: 31.777088165283203
52
+ hours_used: 0.045
53
+ hardware_used: 1 x NVIDIA GeForce RTX 3090
54
+ model-index:
55
+ - name: SentenceTransformer based on distilbert/distilbert-base-uncased
56
+ results:
57
+ - task:
58
+ type: triplet
59
+ name: Triplet
60
+ dataset:
61
+ name: wikipedia sections dev
62
+ type: wikipedia-sections-dev
63
+ metrics:
64
+ - type: cosine_accuracy
65
+ value: 0.733
66
+ name: Cosine Accuracy
67
+ - type: dot_accuracy
68
+ value: 0.269
69
+ name: Dot Accuracy
70
+ - type: manhattan_accuracy
71
+ value: 0.726
72
+ name: Manhattan Accuracy
73
+ - type: euclidean_accuracy
74
+ value: 0.727
75
+ name: Euclidean Accuracy
76
+ - type: max_accuracy
77
+ value: 0.733
78
+ name: Max Accuracy
79
+ - task:
80
+ type: triplet
81
+ name: Triplet
82
+ dataset:
83
+ name: wikipedia sections test
84
+ type: wikipedia-sections-test
85
+ metrics:
86
+ - type: cosine_accuracy
87
+ value: 0.7
88
+ name: Cosine Accuracy
89
+ - type: dot_accuracy
90
+ value: 0.306
91
+ name: Dot Accuracy
92
+ - type: manhattan_accuracy
93
+ value: 0.706
94
+ name: Manhattan Accuracy
95
+ - type: euclidean_accuracy
96
+ value: 0.708
97
+ name: Euclidean Accuracy
98
+ - type: max_accuracy
99
+ value: 0.708
100
+ name: Max Accuracy
101
+ ---
102
+
103
+ # SentenceTransformer based on distilbert/distilbert-base-uncased
104
+
105
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) on the [sentence-transformers/wikipedia-sections](https://huggingface.co/datasets/sentence-transformers/wikipedia-sections) dataset. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
106
+
107
+ ## Model Details
108
+
109
+ ### Model Description
110
+ - **Model Type:** Sentence Transformer
111
+ - **Base model:** [distilbert/distilbert-base-uncased](https://huggingface.co/distilbert/distilbert-base-uncased) <!-- at revision 6cdc0aad91f5ae2e6712e91bc7b65d1cf5c05411 -->
112
+ - **Maximum Sequence Length:** 512 tokens
113
+ - **Output Dimensionality:** 768 tokens
114
+ - **Similarity Function:** Cosine Similarity
115
+ - **Training Dataset:**
116
+ - [sentence-transformers/wikipedia-sections](https://huggingface.co/datasets/sentence-transformers/wikipedia-sections)
117
+ - **Language:** en
118
+ <!-- - **License:** Unknown -->
119
+
120
+ ### Model Sources
121
+
122
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
123
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
124
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
125
+
126
+ ### Full Model Architecture
127
+
128
+ ```
129
+ SentenceTransformer(
130
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: DistilBertModel
131
+ (1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
132
+ )
133
+ ```
134
+
135
+ ## Usage
136
+
137
+ ### Direct Usage (Sentence Transformers)
138
+
139
+ First install the Sentence Transformers library:
140
+
141
+ ```bash
142
+ pip install -U sentence-transformers
143
+ ```
144
+
145
+ Then you can load this model and run inference.
146
+ ```python
147
+ from sentence_transformers import SentenceTransformer
148
+
149
+ # Download from the 🤗 Hub
150
+ model = SentenceTransformer("tomaarsen/distilbert-base-uncased-wikipedia-sections-triplet")
151
+ # Run inference
152
+ sentences = [
153
+ 'Points awarded in the final: .',
154
+ 'Points awarded in the final:[REF] .',
155
+ 'Bishop Ludden recently implemented an innovative House Program.',
156
+ ]
157
+ embeddings = model.encode(sentences)
158
+ print(embeddings.shape)
159
+ # [3, 768]
160
+
161
+ # Get the similarity scores for the embeddings
162
+ similarities = model.similarity(embeddings)
163
+ print(similarities.shape)
164
+ # [3, 3]
165
+ ```
166
+
167
+ <!--
168
+ ### Direct Usage (Transformers)
169
+
170
+ <details><summary>Click to see the direct usage in Transformers</summary>
171
+
172
+ </details>
173
+ -->
174
+
175
+ <!--
176
+ ### Downstream Usage (Sentence Transformers)
177
+
178
+ You can finetune this model on your own dataset.
179
+
180
+ <details><summary>Click to expand</summary>
181
+
182
+ </details>
183
+ -->
184
+
185
+ <!--
186
+ ### Out-of-Scope Use
187
+
188
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
189
+ -->
190
+
191
+ ## Evaluation
192
+
193
+ ### Metrics
194
+
195
+ #### Triplet
196
+ * Dataset: `wikipedia-sections-dev`
197
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
198
+
199
+ | Metric | Value |
200
+ |:-------------------|:----------|
201
+ | cosine_accuracy | 0.733 |
202
+ | dot_accuracy | 0.269 |
203
+ | manhattan_accuracy | 0.726 |
204
+ | euclidean_accuracy | 0.727 |
205
+ | **max_accuracy** | **0.733** |
206
+
207
+ #### Triplet
208
+ * Dataset: `wikipedia-sections-test`
209
+ * Evaluated with [<code>TripletEvaluator</code>](https://sbert.net/docs/package_reference/evaluation.html#sentence_transformers.evaluation.TripletEvaluator)
210
+
211
+ | Metric | Value |
212
+ |:-------------------|:----------|
213
+ | cosine_accuracy | 0.7 |
214
+ | dot_accuracy | 0.306 |
215
+ | manhattan_accuracy | 0.706 |
216
+ | euclidean_accuracy | 0.708 |
217
+ | **max_accuracy** | **0.708** |
218
+
219
+ <!--
220
+ ## Bias, Risks and Limitations
221
+
222
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
223
+ -->
224
+
225
+ <!--
226
+ ### Recommendations
227
+
228
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
229
+ -->
230
+
231
+ ## Training Details
232
+
233
+ ### Training Dataset
234
+
235
+ #### sentence-transformers/wikipedia-sections
236
+
237
+ * Dataset: [sentence-transformers/wikipedia-sections](https://huggingface.co/datasets/sentence-transformers/wikipedia-sections) at [576bb61](https://huggingface.co/datasets/sentence-transformers/wikipedia-sections/tree/576bb61f0fc9ebc728b742f91bd5c81cb7d92c71)
238
+ * Size: 10,000 training samples
239
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
240
+ * Approximate statistics based on the first 1000 samples:
241
+ | | anchor | positive | negative |
242
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
243
+ | type | string | string | string |
244
+ | details | <ul><li>min: 7 tokens</li><li>mean: 31.65 tokens</li><li>max: 72 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 31.54 tokens</li><li>max: 91 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 31.52 tokens</li><li>max: 150 tokens</li></ul> |
245
+ * Samples:
246
+ | anchor | positive | negative |
247
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:----------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
248
+ | <code>Bailey was educated at Ipswich School (1972-79) and at the College of St Hild and St Bede University of Durham (1979-82), where he obtained a first-class degree in Economic history.</code> | <code>He won the Cricket Society's Wetherell Award in 1979 for the best public school all-rounder and played for the NCA Young Cricketers in 1980 [REF].</code> | <code>Bailey was a Fellow of Gonville and Caius College, Cambridge, between 1986 and 1996, lecturing in history and working as Admissions' Tutor.</code> |
249
+ | <code>The record design and production was done by Ivan Stančić Piko and the cover was chosen to be "The Red Nude" act by Amedeo Modigliani.</code> | <code>VIS Idoli was also released as a double cassette EP with Film's Live in Kulušić EP entitled Zajedno.</code> | <code>Promotional video was recorded for "Devojko mala" as the TV stations already broadcast the video for "Malena" and "Zašto su danas devojke ljute", which had its TV premiere on the 1981 New Year's Eve as part of Rokenroler show.</code> |
250
+ | <code>Promotional video was recorded for "Devojko mala" as the TV stations already broadcast the video for "Malena" and "Zašto su danas devojke ljute", which had its TV premiere on the 1981 New Year's Eve as part of Rokenroler show.</code> | <code>"Dok dobuje kiša (u ritmu tam-tama)" and "Malena" appeared on Vlada Divljan's 1996 live album Odbrana i zaštita.</code> | <code>The record design and production was done by Ivan Stančić Piko and the cover was chosen to be "The Red Nude" act by Amedeo Modigliani.</code> |
251
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/losses.html#tripletloss) with these parameters:
252
+ ```json
253
+ {
254
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
255
+ "triplet_margin": 5
256
+ }
257
+ ```
258
+
259
+ ### Evaluation Dataset
260
+
261
+ #### sentence-transformers/wikipedia-sections
262
+
263
+ * Dataset: [sentence-transformers/wikipedia-sections](https://huggingface.co/datasets/sentence-transformers/wikipedia-sections) at [576bb61](https://huggingface.co/datasets/sentence-transformers/wikipedia-sections/tree/576bb61f0fc9ebc728b742f91bd5c81cb7d92c71)
264
+ * Size: 1,000 evaluation samples
265
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
266
+ * Approximate statistics based on the first 1000 samples:
267
+ | | anchor | positive | negative |
268
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
269
+ | type | string | string | string |
270
+ | details | <ul><li>min: 9 tokens</li><li>mean: 29.99 tokens</li><li>max: 77 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 31.02 tokens</li><li>max: 88 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 30.75 tokens</li><li>max: 80 tokens</li></ul> |
271
+ * Samples:
272
+ | anchor | positive | negative |
273
+ |:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
274
+ | <code>Modern airforces have become dependent on airborne radars typically carried by converted airliners and transport aircraft such as the E-3 Sentry and A-50 'Mainstay'.</code> | <code>In late 2003, the missile was offered again on the export market as the 172S-1 [REF].</code> | <code>The mockup shown in 1993 had a strong resemblance to the Buk airframe, but since the Indians became involved there have been some changes.</code> |
275
+ | <code>In May 2005 it was reported that there were two versions, with and without a rocket booster, with ranges of 400 km and 300 km respectively [REF].</code> | <code>Guidance is by inertial navigation until the missile is close enough to the target to use active radar for terminal homing [REF].</code> | <code>The missile resurfaced as the KS-172 in 1999,[REF] as part of a new export-led strategy[REF] whereby foreign investment in a -range export model[REF] would ultimately fund a version for the Russian airforce [REF].</code> |
276
+ | <code>Morris was selected in the sixth round of the 2012 NFL Draft with the 173rd overall pick by the Washington Redskins [REF].</code> | <code>The day before the season opener, coach Mike Shanahan announced that Morris would be the starting running back.</code> | <code>Despite being able to afford a new car, he still drives his 1991 Mazda 626, which he nicknamed "Bentley" [REF].</code> |
277
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/losses.html#tripletloss) with these parameters:
278
+ ```json
279
+ {
280
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
281
+ "triplet_margin": 5
282
+ }
283
+ ```
284
+
285
+ ### Training Hyperparameters
286
+ #### Non-Default Hyperparameters
287
+
288
+ - `eval_strategy`: steps
289
+ - `per_device_train_batch_size`: 16
290
+ - `per_device_eval_batch_size`: 16
291
+ - `num_train_epochs`: 1
292
+ - `warmup_ratio`: 0.1
293
+ - `fp16`: True
294
+
295
+ #### All Hyperparameters
296
+ <details><summary>Click to expand</summary>
297
+
298
+ - `overwrite_output_dir`: False
299
+ - `do_predict`: False
300
+ - `eval_strategy`: steps
301
+ - `prediction_loss_only`: False
302
+ - `per_device_train_batch_size`: 16
303
+ - `per_device_eval_batch_size`: 16
304
+ - `per_gpu_train_batch_size`: None
305
+ - `per_gpu_eval_batch_size`: None
306
+ - `gradient_accumulation_steps`: 1
307
+ - `eval_accumulation_steps`: None
308
+ - `learning_rate`: 5e-05
309
+ - `weight_decay`: 0.0
310
+ - `adam_beta1`: 0.9
311
+ - `adam_beta2`: 0.999
312
+ - `adam_epsilon`: 1e-08
313
+ - `max_grad_norm`: 1.0
314
+ - `num_train_epochs`: 1
315
+ - `max_steps`: -1
316
+ - `lr_scheduler_type`: linear
317
+ - `lr_scheduler_kwargs`: {}
318
+ - `warmup_ratio`: 0.1
319
+ - `warmup_steps`: 0
320
+ - `log_level`: passive
321
+ - `log_level_replica`: warning
322
+ - `log_on_each_node`: True
323
+ - `logging_nan_inf_filter`: True
324
+ - `save_safetensors`: True
325
+ - `save_on_each_node`: False
326
+ - `save_only_model`: False
327
+ - `no_cuda`: False
328
+ - `use_cpu`: False
329
+ - `use_mps_device`: False
330
+ - `seed`: 42
331
+ - `data_seed`: None
332
+ - `jit_mode_eval`: False
333
+ - `use_ipex`: False
334
+ - `bf16`: False
335
+ - `fp16`: True
336
+ - `fp16_opt_level`: O1
337
+ - `half_precision_backend`: auto
338
+ - `bf16_full_eval`: False
339
+ - `fp16_full_eval`: False
340
+ - `tf32`: None
341
+ - `local_rank`: 0
342
+ - `ddp_backend`: None
343
+ - `tpu_num_cores`: None
344
+ - `tpu_metrics_debug`: False
345
+ - `debug`: []
346
+ - `dataloader_drop_last`: False
347
+ - `dataloader_num_workers`: 0
348
+ - `dataloader_prefetch_factor`: None
349
+ - `past_index`: -1
350
+ - `disable_tqdm`: False
351
+ - `remove_unused_columns`: True
352
+ - `label_names`: None
353
+ - `load_best_model_at_end`: False
354
+ - `ignore_data_skip`: False
355
+ - `fsdp`: []
356
+ - `fsdp_min_num_params`: 0
357
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
358
+ - `fsdp_transformer_layer_cls_to_wrap`: None
359
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
360
+ - `deepspeed`: None
361
+ - `label_smoothing_factor`: 0.0
362
+ - `optim`: adamw_torch
363
+ - `optim_args`: None
364
+ - `adafactor`: False
365
+ - `group_by_length`: False
366
+ - `length_column_name`: length
367
+ - `ddp_find_unused_parameters`: None
368
+ - `ddp_bucket_cap_mb`: None
369
+ - `ddp_broadcast_buffers`: None
370
+ - `dataloader_pin_memory`: True
371
+ - `dataloader_persistent_workers`: False
372
+ - `skip_memory_metrics`: True
373
+ - `use_legacy_prediction_loop`: False
374
+ - `push_to_hub`: False
375
+ - `resume_from_checkpoint`: None
376
+ - `hub_model_id`: None
377
+ - `hub_strategy`: every_save
378
+ - `hub_private_repo`: False
379
+ - `hub_always_push`: False
380
+ - `gradient_checkpointing`: False
381
+ - `gradient_checkpointing_kwargs`: None
382
+ - `include_inputs_for_metrics`: False
383
+ - `eval_do_concat_batches`: True
384
+ - `fp16_backend`: auto
385
+ - `push_to_hub_model_id`: None
386
+ - `push_to_hub_organization`: None
387
+ - `mp_parameters`:
388
+ - `auto_find_batch_size`: False
389
+ - `full_determinism`: False
390
+ - `torchdynamo`: None
391
+ - `ray_scope`: last
392
+ - `ddp_timeout`: 1800
393
+ - `torch_compile`: False
394
+ - `torch_compile_backend`: None
395
+ - `torch_compile_mode`: None
396
+ - `dispatch_batches`: None
397
+ - `split_batches`: None
398
+ - `include_tokens_per_second`: False
399
+ - `include_num_input_tokens_seen`: False
400
+ - `neftune_noise_alpha`: None
401
+ - `optim_target_modules`: None
402
+ - `batch_sampler`: batch_sampler
403
+ - `multi_dataset_batch_sampler`: proportional
404
+
405
+ </details>
406
+
407
+ ### Training Logs
408
+ | Epoch | Step | Training Loss | loss | wikipedia-sections-dev_max_accuracy | wikipedia-sections-test_max_accuracy |
409
+ |:-----:|:----:|:-------------:|:------:|:-----------------------------------:|:------------------------------------:|
410
+ | 0.16 | 100 | 3.8017 | 3.4221 | 0.698 | - |
411
+ | 0.32 | 200 | 3.0703 | 3.3261 | 0.717 | - |
412
+ | 0.48 | 300 | 2.9683 | 3.2490 | 0.728 | - |
413
+ | 0.64 | 400 | 2.7731 | 3.2340 | 0.733 | - |
414
+ | 0.8 | 500 | 2.9689 | 3.1583 | 0.737 | - |
415
+ | 0.96 | 600 | 2.8955 | 3.1480 | 0.733 | - |
416
+ | 1.0 | 625 | - | - | - | 0.708 |
417
+
418
+
419
+ ### Environmental Impact
420
+ Carbon emissions were measured using [CodeCarbon](https://github.com/mlco2/codecarbon).
421
+ - **Energy Consumed**: 0.009 kWh
422
+ - **Carbon Emitted**: 0.003 kg of CO2
423
+ - **Hours Used**: 0.045 hours
424
+
425
+ ### Training Hardware
426
+ - **On Cloud**: No
427
+ - **GPU Model**: 1 x NVIDIA GeForce RTX 3090
428
+ - **CPU Model**: 13th Gen Intel(R) Core(TM) i7-13700K
429
+ - **RAM Size**: 31.78 GB
430
+
431
+ ### Framework Versions
432
+ - Python: 3.11.6
433
+ - Sentence Transformers: 3.0.0.dev0
434
+ - Transformers: 4.41.0.dev0
435
+ - PyTorch: 2.3.0+cu121
436
+ - Accelerate: 0.26.1
437
+ - Datasets: 2.18.0
438
+ - Tokenizers: 0.19.1
439
+
440
+ ## Citation
441
+
442
+ ### BibTeX
443
+
444
+ #### Sentence Transformers
445
+ ```bibtex
446
+ @inproceedings{reimers-2019-sentence-bert,
447
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
448
+ author = "Reimers, Nils and Gurevych, Iryna",
449
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
450
+ month = "11",
451
+ year = "2019",
452
+ publisher = "Association for Computational Linguistics",
453
+ url = "https://arxiv.org/abs/1908.10084",
454
+ }
455
+ ```
456
+
457
+ #### TripletLoss
458
+ ```bibtex
459
+ @misc{hermans2017defense,
460
+ title={In Defense of the Triplet Loss for Person Re-Identification},
461
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
462
+ year={2017},
463
+ eprint={1703.07737},
464
+ archivePrefix={arXiv},
465
+ primaryClass={cs.CV}
466
+ }
467
+ ```
468
+
469
+ <!--
470
+ ## Glossary
471
+
472
+ *Clearly define terms in order to be accessible across audiences.*
473
+ -->
474
+
475
+ <!--
476
+ ## Model Card Authors
477
+
478
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
479
+ -->
480
+
481
+ <!--
482
+ ## Model Card Contact
483
+
484
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
485
+ -->
config.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "distilbert-base-uncased",
3
+ "activation": "gelu",
4
+ "architectures": [
5
+ "DistilBertModel"
6
+ ],
7
+ "attention_dropout": 0.1,
8
+ "dim": 768,
9
+ "dropout": 0.1,
10
+ "hidden_dim": 3072,
11
+ "initializer_range": 0.02,
12
+ "max_position_embeddings": 512,
13
+ "model_type": "distilbert",
14
+ "n_heads": 12,
15
+ "n_layers": 6,
16
+ "pad_token_id": 0,
17
+ "qa_dropout": 0.1,
18
+ "seq_classif_dropout": 0.2,
19
+ "sinusoidal_pos_embds": false,
20
+ "tie_weights_": true,
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.41.0.dev0",
23
+ "vocab_size": 30522
24
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.0.dev0",
4
+ "transformers": "4.41.0.dev0",
5
+ "pytorch": "2.3.0+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a985c446b3479442ff89f4e7e82a1393809d2288eec64a61cc2580b84040ce4a
3
+ size 265462608
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {
2
+ "cls_token": "[CLS]",
3
+ "mask_token": "[MASK]",
4
+ "pad_token": "[PAD]",
5
+ "sep_token": "[SEP]",
6
+ "unk_token": "[UNK]"
7
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "[PAD]",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "100": {
12
+ "content": "[UNK]",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "101": {
20
+ "content": "[CLS]",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "102": {
28
+ "content": "[SEP]",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "103": {
36
+ "content": "[MASK]",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "clean_up_tokenization_spaces": true,
45
+ "cls_token": "[CLS]",
46
+ "do_lower_case": true,
47
+ "mask_token": "[MASK]",
48
+ "model_max_length": 1000000000000000019884624838656,
49
+ "pad_token": "[PAD]",
50
+ "sep_token": "[SEP]",
51
+ "strip_accents": null,
52
+ "tokenize_chinese_chars": true,
53
+ "tokenizer_class": "DistilBertTokenizer",
54
+ "unk_token": "[UNK]"
55
+ }
vocab.txt ADDED
The diff for this file is too large to render. See raw diff