Tekkla commited on
Commit
4ec0a37
1 Parent(s): 7b59a1f

Add new SentenceTransformer model.

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 384,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,772 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model: intfloat/multilingual-e5-small
3
+ datasets: []
4
+ language: []
5
+ library_name: sentence-transformers
6
+ pipeline_tag: sentence-similarity
7
+ tags:
8
+ - sentence-transformers
9
+ - sentence-similarity
10
+ - feature-extraction
11
+ - generated_from_trainer
12
+ - dataset_size:24034
13
+ - loss:TripletLoss
14
+ widget:
15
+ - source_sentence: შედეგების მიღების შემდეგ, გინგრიჩმა სანტურუმი აქო, თუმცა მკაცრი
16
+ იყო რომნის მიმართ, რომლის გამოც აიოვას ეთერში გინგრიჩის წინააღმდეგ ნეგატიური სარეკლამო
17
+ კამპანია წარიმართა.
18
+ sentences:
19
+ - In the VET sector, 320 managers completed a training needs assessment within the
20
+ EU-funded project "Technical Assistance to VET and Employment Reforms in Georgia"
21
+ (EUVEGE), with the purpose to enhance VET manager competencies.
22
+ - უბრალო ორიგამი არის ორიგამი შეზღუდვებით, რაც ნიშნავს, რომ ერთ ჯერზე მხოლოდ ერთი
23
+ გადაკეცვაა დასაშვები, დაუშვებელია უფრო რთული გადაკეცვები, როგორიცაა უკან გადაკეცვა
24
+ და ყველა გადაკეცვას აქვს პირდაპირ მიმართული მდებარეობა.
25
+ - After the results came in, Gingrich lauded Santorum, but had tough words for Romney,
26
+ on whose behalf negative campaign advertisements were aired in Iowa against Gingrich.
27
+ - source_sentence: ეს საკითხი აშკარად უფრო დეტალურ განხილვას იმსახურებს.
28
+ sentences:
29
+ - The special advisor appointed by the World Federation for Medical Education took
30
+ part in this assessment to prepare relevant recommendations for the purpose of
31
+ bringing the quality assurance system in Georgia in line with the requirements
32
+ set by the World Federation.
33
+ - This subject clearly deserves a fuller discussion.
34
+ - The September 11 hijackers visited the World Trade Center a number of times, going
35
+ up with the throngs of tourists to the observation deck.
36
+ - source_sentence: უმეტეს შემთხვევაში, ჩართულნი არიან ადამიანები, ვინც შინაურ ფრინველებთან
37
+ მუშაობენ, მაგრამ ფრინველებზე დამკვირვებლებისთვისაც არსებობს გარკვეული რისკი.
38
+ sentences:
39
+ - Most have involved people who work with poultry, but there is also some risk to
40
+ birdwatchers.
41
+ - Hipparion fauna is of major importance for dating the Neogene fossil-bearing sediments.
42
+ - შესაბამის პროცედურებს საფრანგეთის საარჩევნო კანონმდებლობა საკმაოდ მკაცრად ასახავს.
43
+ - source_sentence: აქვეა, თუმცა მიმალულია ვიწრო, ერთმანეთში გადახლართული პეკინური
44
+ ქუჩები და ეზოები, სავსე მოღიმარი, გულღია და ყურადღებიანი ხალხით.
45
+ sentences:
46
+ - Side by side with them, almost hard to glimpse, still exists the web of small
47
+ streets and yards of the old city full of smiling, honest and considerate people.
48
+ - It did so with a sixty-thousand-troop Implementation Force (IFOR), which was followed
49
+ about a year later by a somewhat smaller Stabilization Force (SFOR).
50
+ - Inhibition of glutamate dehydrogenase by benzoquinones in maize seedlings.
51
+ - source_sentence: ლიგანდების კოორდინაციული ბუნება შესწავლილია ინფრაწითელი სპექტროსკოპიული
52
+ და რენტგენოგრაფიული მეთოდებით.
53
+ sentences:
54
+ - La corrélation entre la pathologie du cerveau et le comportement soutient les
55
+ scientifiques dans leurs recherches.
56
+ - The Applicants argued that declaration of unconstitutionality of a normative act
57
+ by the Constitutional Court shall be followed by efficient legal consequences.
58
+ - The coordination character of cyanate ion has been studied by the methods of infrared
59
+ spectra and X-ray.
60
+ ---
61
+
62
+ # SentenceTransformer based on intfloat/multilingual-e5-small
63
+
64
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small). It maps sentences & paragraphs to a 384-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
65
+
66
+ ## Model Details
67
+
68
+ ### Model Description
69
+ - **Model Type:** Sentence Transformer
70
+ - **Base model:** [intfloat/multilingual-e5-small](https://huggingface.co/intfloat/multilingual-e5-small) <!-- at revision 0a68dcd3dad5b4962a78daa930087728292b241d -->
71
+ - **Maximum Sequence Length:** 512 tokens
72
+ - **Output Dimensionality:** 384 tokens
73
+ - **Similarity Function:** Cosine Similarity
74
+ <!-- - **Training Dataset:** Unknown -->
75
+ <!-- - **Language:** Unknown -->
76
+ <!-- - **License:** Unknown -->
77
+
78
+ ### Model Sources
79
+
80
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
81
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
82
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
83
+
84
+ ### Full Model Architecture
85
+
86
+ ```
87
+ SentenceTransformer(
88
+ (0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: BertModel
89
+ (1): Pooling({'word_embedding_dimension': 384, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
90
+ (2): Normalize()
91
+ )
92
+ ```
93
+
94
+ ## Usage
95
+
96
+ ### Direct Usage (Sentence Transformers)
97
+
98
+ First install the Sentence Transformers library:
99
+
100
+ ```bash
101
+ pip install -U sentence-transformers
102
+ ```
103
+
104
+ Then you can load this model and run inference.
105
+ ```python
106
+ from sentence_transformers import SentenceTransformer
107
+
108
+ # Download from the 🤗 Hub
109
+ model = SentenceTransformer("Tekkla/TripletLoss_flores_kaen")
110
+ # Run inference
111
+ sentences = [
112
+ 'ლიგანდების კოორდინაციული ბუნება შესწავლილია ინფრაწითელი სპექტროსკოპიული და რენტგენოგრაფიული მეთოდებით.',
113
+ 'The coordination character of cyanate ion has been studied by the methods of infrared spectra and X-ray.',
114
+ 'The Applicants argued that declaration of unconstitutionality of a normative act by the Constitutional Court shall be followed by efficient legal consequences.',
115
+ ]
116
+ embeddings = model.encode(sentences)
117
+ print(embeddings.shape)
118
+ # [3, 384]
119
+
120
+ # Get the similarity scores for the embeddings
121
+ similarities = model.similarity(embeddings, embeddings)
122
+ print(similarities.shape)
123
+ # [3, 3]
124
+ ```
125
+
126
+ <!--
127
+ ### Direct Usage (Transformers)
128
+
129
+ <details><summary>Click to see the direct usage in Transformers</summary>
130
+
131
+ </details>
132
+ -->
133
+
134
+ <!--
135
+ ### Downstream Usage (Sentence Transformers)
136
+
137
+ You can finetune this model on your own dataset.
138
+
139
+ <details><summary>Click to expand</summary>
140
+
141
+ </details>
142
+ -->
143
+
144
+ <!--
145
+ ### Out-of-Scope Use
146
+
147
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
148
+ -->
149
+
150
+ <!--
151
+ ## Bias, Risks and Limitations
152
+
153
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
154
+ -->
155
+
156
+ <!--
157
+ ### Recommendations
158
+
159
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
160
+ -->
161
+
162
+ ## Training Details
163
+
164
+ ### Training Dataset
165
+
166
+ #### Unnamed Dataset
167
+
168
+
169
+ * Size: 24,034 training samples
170
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
171
+ * Approximate statistics based on the first 1000 samples:
172
+ | | anchor | positive | negative |
173
+ |:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
174
+ | type | string | string | string |
175
+ | details | <ul><li>min: 7 tokens</li><li>mean: 39.79 tokens</li><li>max: 170 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 32.92 tokens</li><li>max: 133 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 36.72 tokens</li><li>max: 154 tokens</li></ul> |
176
+ * Samples:
177
+ | anchor | positive | negative |
178
+ |:---------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------|
179
+ | <code>1979 წელს ის პირობით გაათავისუფლეს.</code> | <code>He was released on licence in 1979.</code> | <code>ფსიქოზის გავრცელების ხარისხი აჩვენებს წრფივ კორელაციას ურბანიზაციის ხარისხთან.</code> |
180
+ | <code>ვეტერინარულ კონტროლს დაქვემდებარებული საქონლის ექსპორტისას - სერტიფიკატის წარდგენა სავალდებულოა მხოლოდ:</code> | <code>When exporting the goods subject to veterinary control - it is mandatory to provide a certificate only:</code> | <code>The Role of Terrestrial Mollusks in Propagation of Trematodes in Urban Environment.</code> |
181
+ | <code>ბელა, ხომ კარგად ხარ?</code> | <code>– Bella, are you okay?</code> | <code>• to gain feedback on leading questions;</code> |
182
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
183
+ ```json
184
+ {
185
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
186
+ "triplet_margin": 5
187
+ }
188
+ ```
189
+
190
+ ### Evaluation Dataset
191
+
192
+ #### Unnamed Dataset
193
+
194
+
195
+ * Size: 3,005 evaluation samples
196
+ * Columns: <code>anchor</code>, <code>positive</code>, and <code>negative</code>
197
+ * Approximate statistics based on the first 1000 samples:
198
+ | | anchor | positive | negative |
199
+ |:--------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|:----------------------------------------------------------------------------------|
200
+ | type | string | string | string |
201
+ | details | <ul><li>min: 8 tokens</li><li>mean: 38.7 tokens</li><li>max: 138 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 31.89 tokens</li><li>max: 96 tokens</li></ul> | <ul><li>min: 8 tokens</li><li>mean: 36.32 tokens</li><li>max: 95 tokens</li></ul> |
202
+ * Samples:
203
+ | anchor | positive | negative |
204
+ |:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------|
205
+ | <code>3. თუ გადასახადის გადამხდელი იღებს ან მას უფლება აქვს, მიიღოს შემოსავალი პროცენტის სახით ან ქონების იჯარით გადაცემით, შემოსავალი სავალო ვალდებულების ან იჯარის ხელშეკრულების ვადის გასვლის მომენტში მიღებულად ითვლება.</code> | <code>3. If a taxpayer earns or has the right to earn income in the form of interest or from leasing property, the income shall be deemed to have been obtained at the moment when the debt obligation or lease agreement expires.</code> | <code>In, Cd და Bi დაცილება ანიონიტ AB–17-ის OH′-ფორმაზე დალექვითი ქრომატოგრაფიის მეთოდით.</code> |
206
+ | <code>პროფესიონალიზმის მაღალი ხარისხი ნიშნავს, რომ ჟურნალისტიკა, როგორც ინსტიტუტი, დიფერენცირებულია და სხვა ინსტიტუტებისგან განსხვავებული პრაქტიკა აქვს, მათ შორის, პოლიტიკის ჩათვლით.</code> | <code>A high degree of professionalization of journalism means that journalism is differentiated as an institution and form of practice from other institutions and forms of practice – including politics. </code> | <code>ჯანმრთელობის დაცვა და სოციალური დახმარება, კომუნალური, სოციალური და პერსონალური მომსახურების გაწევა.</code> |
207
+ | <code>ამგვარად, მსგავს შემთხვევებში შეიძლება საჭირო იყოს დამატებითი ფრაზები, რათა თავიდან იქნეს აცილებული ისე წარმოჩენა, თითქოს მარწმუნებელ ანგარიშში ნაგულისხმევია, რომ პრაქტიკოსის პასუხისმგებლობა გამოთქმულ დასკვნაზე შემცირებულია ექსპერტის ჩართულობის გამო.</code> | <code>Therefore, additional wording may be needed in such cases to prevent the assurance report implying that the practitioner’s responsibility for the conclusion expressed is reduced because of the involvement of the expert.</code> | <code>სმენის პროთეზირება მრგვალი სარკმლის ეკრანირებისათვის ფოროვანი ელასტომერის და მეტალის ფირფიტის გამოყენებით.</code> |
208
+ * Loss: [<code>TripletLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#tripletloss) with these parameters:
209
+ ```json
210
+ {
211
+ "distance_metric": "TripletDistanceMetric.EUCLIDEAN",
212
+ "triplet_margin": 5
213
+ }
214
+ ```
215
+
216
+ ### Training Hyperparameters
217
+ #### Non-Default Hyperparameters
218
+
219
+ - `eval_strategy`: steps
220
+ - `per_device_train_batch_size`: 16
221
+ - `per_device_eval_batch_size`: 16
222
+ - `gradient_accumulation_steps`: 2
223
+ - `learning_rate`: 0.0001
224
+ - `num_train_epochs`: 10
225
+ - `warmup_steps`: 1000
226
+ - `batch_sampler`: no_duplicates
227
+
228
+ #### All Hyperparameters
229
+ <details><summary>Click to expand</summary>
230
+
231
+ - `overwrite_output_dir`: False
232
+ - `do_predict`: False
233
+ - `eval_strategy`: steps
234
+ - `prediction_loss_only`: True
235
+ - `per_device_train_batch_size`: 16
236
+ - `per_device_eval_batch_size`: 16
237
+ - `per_gpu_train_batch_size`: None
238
+ - `per_gpu_eval_batch_size`: None
239
+ - `gradient_accumulation_steps`: 2
240
+ - `eval_accumulation_steps`: None
241
+ - `learning_rate`: 0.0001
242
+ - `weight_decay`: 0.0
243
+ - `adam_beta1`: 0.9
244
+ - `adam_beta2`: 0.999
245
+ - `adam_epsilon`: 1e-08
246
+ - `max_grad_norm`: 1.0
247
+ - `num_train_epochs`: 10
248
+ - `max_steps`: -1
249
+ - `lr_scheduler_type`: linear
250
+ - `lr_scheduler_kwargs`: {}
251
+ - `warmup_ratio`: 0.0
252
+ - `warmup_steps`: 1000
253
+ - `log_level`: passive
254
+ - `log_level_replica`: warning
255
+ - `log_on_each_node`: True
256
+ - `logging_nan_inf_filter`: True
257
+ - `save_safetensors`: True
258
+ - `save_on_each_node`: False
259
+ - `save_only_model`: False
260
+ - `restore_callback_states_from_checkpoint`: False
261
+ - `no_cuda`: False
262
+ - `use_cpu`: False
263
+ - `use_mps_device`: False
264
+ - `seed`: 42
265
+ - `data_seed`: None
266
+ - `jit_mode_eval`: False
267
+ - `use_ipex`: False
268
+ - `bf16`: False
269
+ - `fp16`: False
270
+ - `fp16_opt_level`: O1
271
+ - `half_precision_backend`: auto
272
+ - `bf16_full_eval`: False
273
+ - `fp16_full_eval`: False
274
+ - `tf32`: None
275
+ - `local_rank`: 0
276
+ - `ddp_backend`: None
277
+ - `tpu_num_cores`: None
278
+ - `tpu_metrics_debug`: False
279
+ - `debug`: []
280
+ - `dataloader_drop_last`: False
281
+ - `dataloader_num_workers`: 0
282
+ - `dataloader_prefetch_factor`: None
283
+ - `past_index`: -1
284
+ - `disable_tqdm`: False
285
+ - `remove_unused_columns`: True
286
+ - `label_names`: None
287
+ - `load_best_model_at_end`: False
288
+ - `ignore_data_skip`: False
289
+ - `fsdp`: []
290
+ - `fsdp_min_num_params`: 0
291
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
292
+ - `fsdp_transformer_layer_cls_to_wrap`: None
293
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
294
+ - `deepspeed`: None
295
+ - `label_smoothing_factor`: 0.0
296
+ - `optim`: adamw_torch
297
+ - `optim_args`: None
298
+ - `adafactor`: False
299
+ - `group_by_length`: False
300
+ - `length_column_name`: length
301
+ - `ddp_find_unused_parameters`: None
302
+ - `ddp_bucket_cap_mb`: None
303
+ - `ddp_broadcast_buffers`: False
304
+ - `dataloader_pin_memory`: True
305
+ - `dataloader_persistent_workers`: False
306
+ - `skip_memory_metrics`: True
307
+ - `use_legacy_prediction_loop`: False
308
+ - `push_to_hub`: False
309
+ - `resume_from_checkpoint`: None
310
+ - `hub_model_id`: None
311
+ - `hub_strategy`: every_save
312
+ - `hub_private_repo`: False
313
+ - `hub_always_push`: False
314
+ - `gradient_checkpointing`: False
315
+ - `gradient_checkpointing_kwargs`: None
316
+ - `include_inputs_for_metrics`: False
317
+ - `eval_do_concat_batches`: True
318
+ - `fp16_backend`: auto
319
+ - `push_to_hub_model_id`: None
320
+ - `push_to_hub_organization`: None
321
+ - `mp_parameters`:
322
+ - `auto_find_batch_size`: False
323
+ - `full_determinism`: False
324
+ - `torchdynamo`: None
325
+ - `ray_scope`: last
326
+ - `ddp_timeout`: 1800
327
+ - `torch_compile`: False
328
+ - `torch_compile_backend`: None
329
+ - `torch_compile_mode`: None
330
+ - `dispatch_batches`: None
331
+ - `split_batches`: None
332
+ - `include_tokens_per_second`: False
333
+ - `include_num_input_tokens_seen`: False
334
+ - `neftune_noise_alpha`: None
335
+ - `optim_target_modules`: None
336
+ - `batch_eval_metrics`: False
337
+ - `eval_on_start`: False
338
+ - `batch_sampler`: no_duplicates
339
+ - `multi_dataset_batch_sampler`: proportional
340
+
341
+ </details>
342
+
343
+ ### Training Logs
344
+ <details><summary>Click to expand</summary>
345
+
346
+ | Epoch | Step | Training Loss | loss |
347
+ |:------:|:----:|:-------------:|:------:|
348
+ | 0.0133 | 10 | 4.7952 | - |
349
+ | 0.0266 | 20 | 4.7856 | - |
350
+ | 0.0399 | 30 | 4.7634 | - |
351
+ | 0.0532 | 40 | 4.7186 | - |
352
+ | 0.0665 | 50 | 4.6771 | - |
353
+ | 0.0798 | 60 | 4.6085 | - |
354
+ | 0.0931 | 70 | 4.4944 | - |
355
+ | 0.1065 | 80 | 4.3714 | - |
356
+ | 0.1198 | 90 | 4.2601 | - |
357
+ | 0.1331 | 100 | 4.2006 | 4.1392 |
358
+ | 0.1464 | 110 | 4.1937 | - |
359
+ | 0.1597 | 120 | 4.1503 | - |
360
+ | 0.1730 | 130 | 4.1355 | - |
361
+ | 0.1863 | 140 | 4.1164 | - |
362
+ | 0.1996 | 150 | 4.0822 | - |
363
+ | 0.2129 | 160 | 4.0613 | - |
364
+ | 0.2262 | 170 | 4.0549 | - |
365
+ | 0.2395 | 180 | 4.0938 | - |
366
+ | 0.2528 | 190 | 3.9957 | - |
367
+ | 0.2661 | 200 | 4.0573 | 3.9721 |
368
+ | 0.2794 | 210 | 4.0657 | - |
369
+ | 0.2927 | 220 | 4.0191 | - |
370
+ | 0.3061 | 230 | 4.0222 | - |
371
+ | 0.3194 | 240 | 4.0265 | - |
372
+ | 0.3327 | 250 | 4.0407 | - |
373
+ | 0.3460 | 260 | 3.997 | - |
374
+ | 0.3593 | 270 | 3.9782 | - |
375
+ | 0.3726 | 280 | 3.9818 | - |
376
+ | 0.3859 | 290 | 3.9965 | - |
377
+ | 0.3992 | 300 | 3.989 | 3.9337 |
378
+ | 0.4125 | 310 | 3.9439 | - |
379
+ | 0.4258 | 320 | 4.0057 | - |
380
+ | 0.4391 | 330 | 3.9681 | - |
381
+ | 0.4524 | 340 | 3.9903 | - |
382
+ | 0.4657 | 350 | 3.9816 | - |
383
+ | 0.4790 | 360 | 3.9776 | - |
384
+ | 0.4923 | 370 | 3.9555 | - |
385
+ | 0.5057 | 380 | 3.9927 | - |
386
+ | 0.5190 | 390 | 3.9753 | - |
387
+ | 0.5323 | 400 | 3.9917 | 3.9099 |
388
+ | 0.5456 | 410 | 3.9693 | - |
389
+ | 0.5589 | 420 | 3.9546 | - |
390
+ | 0.5722 | 430 | 3.9701 | - |
391
+ | 0.5855 | 440 | 3.9558 | - |
392
+ | 0.5988 | 450 | 3.9677 | - |
393
+ | 0.6121 | 460 | 3.953 | - |
394
+ | 0.6254 | 470 | 3.9279 | - |
395
+ | 0.6387 | 480 | 3.982 | - |
396
+ | 0.6520 | 490 | 3.9113 | - |
397
+ | 0.6653 | 500 | 3.9419 | 3.8756 |
398
+ | 0.6786 | 510 | 3.8882 | - |
399
+ | 0.6919 | 520 | 3.9268 | - |
400
+ | 0.7053 | 530 | 3.9446 | - |
401
+ | 0.7186 | 540 | 3.8975 | - |
402
+ | 0.7319 | 550 | 3.939 | - |
403
+ | 0.7452 | 560 | 3.9551 | - |
404
+ | 0.7585 | 570 | 3.931 | - |
405
+ | 0.7718 | 580 | 3.9403 | - |
406
+ | 0.7851 | 590 | 3.9375 | - |
407
+ | 0.7984 | 600 | 3.9305 | 3.8727 |
408
+ | 0.8117 | 610 | 3.9354 | - |
409
+ | 0.8250 | 620 | 3.9104 | - |
410
+ | 0.8383 | 630 | 3.9487 | - |
411
+ | 0.8516 | 640 | 3.9716 | - |
412
+ | 0.8649 | 650 | 3.9227 | - |
413
+ | 0.8782 | 660 | 3.9487 | - |
414
+ | 0.8916 | 670 | 3.9278 | - |
415
+ | 0.9049 | 680 | 3.9275 | - |
416
+ | 0.9182 | 690 | 3.9496 | - |
417
+ | 0.9315 | 700 | 3.9178 | 3.8614 |
418
+ | 0.9448 | 710 | 3.9015 | - |
419
+ | 0.9581 | 720 | 3.984 | - |
420
+ | 0.9714 | 730 | 3.917 | - |
421
+ | 0.9847 | 740 | 3.9371 | - |
422
+ | 0.9980 | 750 | 3.9106 | - |
423
+ | 1.0113 | 760 | 3.892 | - |
424
+ | 1.0246 | 770 | 3.8854 | - |
425
+ | 1.0379 | 780 | 3.9142 | - |
426
+ | 1.0512 | 790 | 3.9096 | - |
427
+ | 1.0645 | 800 | 3.9099 | 3.8635 |
428
+ | 1.0778 | 810 | 3.9599 | - |
429
+ | 1.0912 | 820 | 3.9025 | - |
430
+ | 1.1045 | 830 | 3.888 | - |
431
+ | 1.1178 | 840 | 3.8837 | - |
432
+ | 1.1311 | 850 | 3.9253 | - |
433
+ | 1.1444 | 860 | 3.9419 | - |
434
+ | 1.1577 | 870 | 3.8841 | - |
435
+ | 1.1710 | 880 | 3.9644 | - |
436
+ | 1.1843 | 890 | 3.9211 | - |
437
+ | 1.1976 | 900 | 3.9088 | 3.8651 |
438
+ | 1.2109 | 910 | 3.9024 | - |
439
+ | 1.2242 | 920 | 3.9129 | - |
440
+ | 1.2375 | 930 | 4.0027 | - |
441
+ | 1.2508 | 940 | 3.9038 | - |
442
+ | 1.2641 | 950 | 3.8736 | - |
443
+ | 1.2774 | 960 | 3.9454 | - |
444
+ | 1.2908 | 970 | 3.9104 | - |
445
+ | 1.3041 | 980 | 3.9552 | - |
446
+ | 1.3174 | 990 | 3.9194 | - |
447
+ | 1.3307 | 1000 | 3.9635 | 3.8888 |
448
+ | 1.3440 | 1010 | 3.8538 | - |
449
+ | 1.3573 | 1020 | 3.8927 | - |
450
+ | 1.3706 | 1030 | 3.8978 | - |
451
+ | 1.3839 | 1040 | 3.9293 | - |
452
+ | 1.3972 | 1050 | 3.8962 | - |
453
+ | 1.4105 | 1060 | 3.8857 | - |
454
+ | 1.4238 | 1070 | 3.9146 | - |
455
+ | 1.4371 | 1080 | 3.8997 | - |
456
+ | 1.4504 | 1090 | 3.9347 | - |
457
+ | 1.4637 | 1100 | 3.9239 | 3.8753 |
458
+ | 1.4770 | 1110 | 3.9165 | - |
459
+ | 1.4904 | 1120 | 3.8733 | - |
460
+ | 1.5037 | 1130 | 3.8981 | - |
461
+ | 1.5170 | 1140 | 3.8948 | - |
462
+ | 1.5303 | 1150 | 3.9131 | - |
463
+ | 1.5436 | 1160 | 3.8931 | - |
464
+ | 1.5569 | 1170 | 3.9122 | - |
465
+ | 1.5702 | 1180 | 3.8837 | - |
466
+ | 1.5835 | 1190 | 3.8917 | - |
467
+ | 1.5968 | 1200 | 3.9078 | 3.9019 |
468
+ | 1.6101 | 1210 | 3.9066 | - |
469
+ | 1.6234 | 1220 | 3.911 | - |
470
+ | 1.6367 | 1230 | 3.9278 | - |
471
+ | 1.6500 | 1240 | 3.8323 | - |
472
+ | 1.6633 | 1250 | 3.8966 | - |
473
+ | 1.6766 | 1260 | 3.9212 | - |
474
+ | 1.6900 | 1270 | 3.8609 | - |
475
+ | 1.7033 | 1280 | 3.8928 | - |
476
+ | 1.7166 | 1290 | 3.8495 | - |
477
+ | 1.7299 | 1300 | 3.8748 | 3.8766 |
478
+ | 1.7432 | 1310 | 3.9214 | - |
479
+ | 1.7565 | 1320 | 3.8944 | - |
480
+ | 1.7698 | 1330 | 3.9011 | - |
481
+ | 1.7831 | 1340 | 3.8986 | - |
482
+ | 1.7964 | 1350 | 3.8911 | - |
483
+ | 1.8097 | 1360 | 3.8789 | - |
484
+ | 1.8230 | 1370 | 3.8749 | - |
485
+ | 1.8363 | 1380 | 3.8835 | - |
486
+ | 1.8496 | 1390 | 3.9067 | - |
487
+ | 1.8629 | 1400 | 3.9141 | 3.8553 |
488
+ | 1.8762 | 1410 | 3.9095 | - |
489
+ | 1.8896 | 1420 | 3.8742 | - |
490
+ | 1.9029 | 1430 | 3.8965 | - |
491
+ | 1.9162 | 1440 | 3.91 | - |
492
+ | 1.9295 | 1450 | 3.8745 | - |
493
+ | 1.9428 | 1460 | 3.8642 | - |
494
+ | 1.9561 | 1470 | 3.9136 | - |
495
+ | 1.9694 | 1480 | 3.8681 | - |
496
+ | 1.9827 | 1490 | 3.8942 | - |
497
+ | 1.9960 | 1500 | 3.8332 | 3.8629 |
498
+ | 2.0093 | 1510 | 3.8361 | - |
499
+ | 2.0226 | 1520 | 3.872 | - |
500
+ | 2.0359 | 1530 | 3.8742 | - |
501
+ | 2.0492 | 1540 | 3.8621 | - |
502
+ | 2.0625 | 1550 | 3.8804 | - |
503
+ | 2.0758 | 1560 | 3.8928 | - |
504
+ | 2.0892 | 1570 | 3.8203 | - |
505
+ | 2.1025 | 1580 | 3.7907 | - |
506
+ | 2.1158 | 1590 | 3.85 | - |
507
+ | 2.1291 | 1600 | 3.823 | 3.8559 |
508
+ | 2.1424 | 1610 | 3.8706 | - |
509
+ | 2.1557 | 1620 | 3.8681 | - |
510
+ | 2.1690 | 1630 | 3.8459 | - |
511
+ | 2.1823 | 1640 | 3.8592 | - |
512
+ | 2.1956 | 1650 | 3.8635 | - |
513
+ | 2.2089 | 1660 | 3.8668 | - |
514
+ | 2.2222 | 1670 | 3.8677 | - |
515
+ | 2.2355 | 1680 | 3.8798 | - |
516
+ | 2.2488 | 1690 | 3.8385 | - |
517
+ | 2.2621 | 1700 | 3.8293 | 3.8560 |
518
+ | 2.2754 | 1710 | 3.8508 | - |
519
+ | 2.2888 | 1720 | 3.8703 | - |
520
+ | 2.3021 | 1730 | 3.8749 | - |
521
+ | 2.3154 | 1740 | 3.8837 | - |
522
+ | 2.3287 | 1750 | 3.8855 | - |
523
+ | 2.3420 | 1760 | 3.8291 | - |
524
+ | 2.3553 | 1770 | 3.8449 | - |
525
+ | 2.3686 | 1780 | 3.8325 | - |
526
+ | 2.3819 | 1790 | 3.8719 | - |
527
+ | 2.3952 | 1800 | 3.8141 | 3.8731 |
528
+ | 2.4085 | 1810 | 3.8325 | - |
529
+ | 2.4218 | 1820 | 3.8812 | - |
530
+ | 2.4351 | 1830 | 3.8565 | - |
531
+ | 2.4484 | 1840 | 3.8644 | - |
532
+ | 2.4617 | 1850 | 3.8812 | - |
533
+ | 2.4750 | 1860 | 3.869 | - |
534
+ | 2.4884 | 1870 | 3.8284 | - |
535
+ | 2.5017 | 1880 | 3.8615 | - |
536
+ | 2.5150 | 1890 | 3.8223 | - |
537
+ | 2.5283 | 1900 | 3.8676 | 3.8441 |
538
+ | 2.5416 | 1910 | 3.8528 | - |
539
+ | 2.5549 | 1920 | 3.8715 | - |
540
+ | 2.5682 | 1930 | 3.856 | - |
541
+ | 2.5815 | 1940 | 3.8192 | - |
542
+ | 2.5948 | 1950 | 3.8814 | - |
543
+ | 2.6081 | 1960 | 3.8194 | - |
544
+ | 2.6214 | 1970 | 3.8343 | - |
545
+ | 2.6347 | 1980 | 3.846 | - |
546
+ | 2.6480 | 1990 | 3.8926 | - |
547
+ | 2.6613 | 2000 | 3.8404 | 3.8484 |
548
+ | 2.6747 | 2010 | 3.816 | - |
549
+ | 2.6880 | 2020 | 3.8457 | - |
550
+ | 2.7013 | 2030 | 3.8496 | - |
551
+ | 2.7146 | 2040 | 3.8099 | - |
552
+ | 2.7279 | 2050 | 3.8689 | - |
553
+ | 2.7412 | 2060 | 3.849 | - |
554
+ | 2.7545 | 2070 | 3.8404 | - |
555
+ | 2.7678 | 2080 | 3.8555 | - |
556
+ | 2.7811 | 2090 | 3.878 | - |
557
+ | 2.7944 | 2100 | 3.8175 | 3.8656 |
558
+ | 2.8077 | 2110 | 3.8551 | - |
559
+ | 2.8210 | 2120 | 3.8031 | - |
560
+ | 2.8343 | 2130 | 3.8679 | - |
561
+ | 2.8476 | 2140 | 3.8591 | - |
562
+ | 2.8609 | 2150 | 3.8395 | - |
563
+ | 2.8743 | 2160 | 3.8368 | - |
564
+ | 2.8876 | 2170 | 3.8351 | - |
565
+ | 2.9009 | 2180 | 3.8646 | - |
566
+ | 2.9142 | 2190 | 3.8841 | - |
567
+ | 2.9275 | 2200 | 3.8473 | 3.8684 |
568
+ | 2.9408 | 2210 | 3.8345 | - |
569
+ | 2.9541 | 2220 | 3.845 | - |
570
+ | 2.9674 | 2230 | 3.8374 | - |
571
+ | 2.9807 | 2240 | 3.8252 | - |
572
+ | 2.9940 | 2250 | 3.7778 | - |
573
+ | 3.0073 | 2260 | 3.7963 | - |
574
+ | 3.0206 | 2270 | 3.8533 | - |
575
+ | 3.0339 | 2280 | 3.8338 | - |
576
+ | 3.0472 | 2290 | 3.8037 | - |
577
+ | 3.0605 | 2300 | 3.789 | 3.8640 |
578
+ | 3.0739 | 2310 | 3.8344 | - |
579
+ | 3.0872 | 2320 | 3.8114 | - |
580
+ | 3.1005 | 2330 | 3.7935 | - |
581
+ | 3.1138 | 2340 | 3.7721 | - |
582
+ | 3.1271 | 2350 | 3.8016 | - |
583
+ | 3.1404 | 2360 | 3.8206 | - |
584
+ | 3.1537 | 2370 | 3.8103 | - |
585
+ | 3.1670 | 2380 | 3.8053 | - |
586
+ | 3.1803 | 2390 | 3.8356 | - |
587
+ | 3.1936 | 2400 | 3.8245 | 3.8609 |
588
+ | 3.2069 | 2410 | 3.8099 | - |
589
+ | 3.2202 | 2420 | 3.8413 | - |
590
+ | 3.2335 | 2430 | 3.8133 | - |
591
+ | 3.2468 | 2440 | 3.8218 | - |
592
+ | 3.2601 | 2450 | 3.8258 | - |
593
+ | 3.2735 | 2460 | 3.7975 | - |
594
+ | 3.2868 | 2470 | 3.8513 | - |
595
+ | 3.3001 | 2480 | 3.7996 | - |
596
+ | 3.3134 | 2490 | 3.8503 | - |
597
+ | 3.3267 | 2500 | 3.7947 | 3.8511 |
598
+ | 3.3400 | 2510 | 3.7984 | - |
599
+ | 3.3533 | 2520 | 3.8075 | - |
600
+ | 3.3666 | 2530 | 3.8049 | - |
601
+ | 3.3799 | 2540 | 3.8186 | - |
602
+ | 3.3932 | 2550 | 3.7944 | - |
603
+ | 3.4065 | 2560 | 3.8104 | - |
604
+ | 3.4198 | 2570 | 3.817 | - |
605
+ | 3.4331 | 2580 | 3.8052 | - |
606
+ | 3.4464 | 2590 | 3.8233 | - |
607
+ | 3.4597 | 2600 | 3.8671 | 3.8738 |
608
+ | 3.4731 | 2610 | 3.824 | - |
609
+ | 3.4864 | 2620 | 3.8215 | - |
610
+ | 3.4997 | 2630 | 3.8113 | - |
611
+ | 3.5130 | 2640 | 3.7831 | - |
612
+ | 3.5263 | 2650 | 3.8616 | - |
613
+ | 3.5396 | 2660 | 3.8325 | - |
614
+ | 3.5529 | 2670 | 3.8189 | - |
615
+ | 3.5662 | 2680 | 3.865 | - |
616
+ | 3.5795 | 2690 | 3.7572 | - |
617
+ | 3.5928 | 2700 | 3.8308 | 3.8531 |
618
+ | 3.6061 | 2710 | 3.7959 | - |
619
+ | 3.6194 | 2720 | 3.8129 | - |
620
+ | 3.6327 | 2730 | 3.8402 | - |
621
+ | 3.6460 | 2740 | 3.8114 | - |
622
+ | 3.6593 | 2750 | 3.7955 | - |
623
+ | 3.6727 | 2760 | 3.8054 | - |
624
+ | 3.6860 | 2770 | 3.7986 | - |
625
+ | 3.6993 | 2780 | 3.7911 | - |
626
+ | 3.7126 | 2790 | 3.8203 | - |
627
+ | 3.7259 | 2800 | 3.7763 | 3.8455 |
628
+ | 3.7392 | 2810 | 3.8178 | - |
629
+ | 3.7525 | 2820 | 3.8654 | - |
630
+ | 3.7658 | 2830 | 3.8132 | - |
631
+ | 3.7791 | 2840 | 3.8255 | - |
632
+ | 3.7924 | 2850 | 3.7809 | - |
633
+ | 3.8057 | 2860 | 3.8175 | - |
634
+ | 3.8190 | 2870 | 3.7677 | - |
635
+ | 3.8323 | 2880 | 3.8271 | - |
636
+ | 3.8456 | 2890 | 3.8145 | - |
637
+ | 3.8589 | 2900 | 3.8025 | 3.8522 |
638
+ | 3.8723 | 2910 | 3.787 | - |
639
+ | 3.8856 | 2920 | 3.8068 | - |
640
+ | 3.8989 | 2930 | 3.8305 | - |
641
+ | 3.9122 | 2940 | 3.849 | - |
642
+ | 3.9255 | 2950 | 3.7765 | - |
643
+ | 3.9388 | 2960 | 3.8451 | - |
644
+ | 3.9521 | 2970 | 3.8468 | - |
645
+ | 3.9654 | 2980 | 3.8188 | - |
646
+ | 3.9787 | 2990 | 3.7912 | - |
647
+ | 3.9920 | 3000 | 3.7558 | 3.8499 |
648
+ | 4.0053 | 3010 | 3.7498 | - |
649
+ | 4.0186 | 3020 | 3.8196 | - |
650
+ | 4.0319 | 3030 | 3.8121 | - |
651
+ | 4.0452 | 3040 | 3.7971 | - |
652
+ | 4.0585 | 3050 | 3.7756 | - |
653
+ | 4.0719 | 3060 | 3.7782 | - |
654
+ | 4.0852 | 3070 | 3.7915 | - |
655
+ | 4.0985 | 3080 | 3.782 | - |
656
+ | 4.1118 | 3090 | 3.7506 | - |
657
+ | 4.1251 | 3100 | 3.782 | 3.8648 |
658
+ | 4.1384 | 3110 | 3.7541 | - |
659
+ | 4.1517 | 3120 | 3.8093 | - |
660
+ | 4.1650 | 3130 | 3.7708 | - |
661
+ | 4.1783 | 3140 | 3.8064 | - |
662
+ | 4.1916 | 3150 | 3.7941 | - |
663
+ | 4.2049 | 3160 | 3.7623 | - |
664
+ | 4.2182 | 3170 | 3.8032 | - |
665
+ | 4.2315 | 3180 | 3.7828 | - |
666
+ | 4.2448 | 3190 | 3.8005 | - |
667
+ | 4.2582 | 3200 | 3.7736 | 3.8566 |
668
+ | 4.2715 | 3210 | 3.7538 | - |
669
+ | 4.2848 | 3220 | 3.8005 | - |
670
+ | 4.2981 | 3230 | 3.7946 | - |
671
+ | 4.3114 | 3240 | 3.8061 | - |
672
+ | 4.3247 | 3250 | 3.7911 | - |
673
+ | 4.3380 | 3260 | 3.7947 | - |
674
+ | 4.3513 | 3270 | 3.7622 | - |
675
+ | 4.3646 | 3280 | 3.7866 | - |
676
+ | 4.3779 | 3290 | 3.7812 | - |
677
+ | 4.3912 | 3300 | 3.7575 | 3.8530 |
678
+ | 4.4045 | 3310 | 3.7578 | - |
679
+ | 4.4178 | 3320 | 3.7521 | - |
680
+ | 4.4311 | 3330 | 3.7863 | - |
681
+ | 4.4444 | 3340 | 3.7835 | - |
682
+ | 4.4578 | 3350 | 3.8357 | - |
683
+ | 4.4711 | 3360 | 3.796 | - |
684
+ | 4.4844 | 3370 | 3.7951 | - |
685
+ | 4.4977 | 3380 | 3.7668 | - |
686
+ | 4.5110 | 3390 | 3.7735 | - |
687
+ | 4.5243 | 3400 | 3.7996 | 3.8634 |
688
+ | 4.5376 | 3410 | 3.7848 | - |
689
+ | 4.5509 | 3420 | 3.7763 | - |
690
+ | 4.5642 | 3430 | 3.7953 | - |
691
+ | 4.5775 | 3440 | 3.7485 | - |
692
+ | 4.5908 | 3450 | 3.793 | - |
693
+ | 4.6041 | 3460 | 3.7641 | - |
694
+ | 4.6174 | 3470 | 3.7535 | - |
695
+ | 4.6307 | 3480 | 3.7975 | - |
696
+ | 4.6440 | 3490 | 3.81 | - |
697
+ | 4.6574 | 3500 | 3.7288 | 3.8684 |
698
+ | 4.6707 | 3510 | 3.8165 | - |
699
+ | 4.6840 | 3520 | 3.7747 | - |
700
+ | 4.6973 | 3530 | 3.7402 | - |
701
+ | 4.7106 | 3540 | 3.7528 | - |
702
+ | 4.7239 | 3550 | 3.7532 | - |
703
+ | 4.7372 | 3560 | 3.7766 | - |
704
+ | 4.7505 | 3570 | 3.8459 | - |
705
+ | 4.7638 | 3580 | 3.785 | - |
706
+ | 4.7771 | 3590 | 3.8026 | - |
707
+ | 4.7904 | 3600 | 3.7801 | 3.8470 |
708
+ | 4.8037 | 3610 | 3.7737 | - |
709
+ | 4.8170 | 3620 | 3.7665 | - |
710
+ | 4.8303 | 3630 | 3.8046 | - |
711
+ | 4.8436 | 3640 | 3.757 | - |
712
+ | 4.8570 | 3650 | 3.7978 | - |
713
+ | 4.8703 | 3660 | 3.779 | - |
714
+ | 4.8836 | 3670 | 3.7528 | 3.8492 |
715
+
716
+ </details>
717
+
718
+ ### Framework Versions
719
+ - Python: 3.10.12
720
+ - Sentence Transformers: 3.0.1
721
+ - Transformers: 4.42.4
722
+ - PyTorch: 2.3.1+cu121
723
+ - Accelerate: 0.32.1
724
+ - Datasets: 2.20.0
725
+ - Tokenizers: 0.19.1
726
+
727
+ ## Citation
728
+
729
+ ### BibTeX
730
+
731
+ #### Sentence Transformers
732
+ ```bibtex
733
+ @inproceedings{reimers-2019-sentence-bert,
734
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
735
+ author = "Reimers, Nils and Gurevych, Iryna",
736
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
737
+ month = "11",
738
+ year = "2019",
739
+ publisher = "Association for Computational Linguistics",
740
+ url = "https://arxiv.org/abs/1908.10084",
741
+ }
742
+ ```
743
+
744
+ #### TripletLoss
745
+ ```bibtex
746
+ @misc{hermans2017defense,
747
+ title={In Defense of the Triplet Loss for Person Re-Identification},
748
+ author={Alexander Hermans and Lucas Beyer and Bastian Leibe},
749
+ year={2017},
750
+ eprint={1703.07737},
751
+ archivePrefix={arXiv},
752
+ primaryClass={cs.CV}
753
+ }
754
+ ```
755
+
756
+ <!--
757
+ ## Glossary
758
+
759
+ *Clearly define terms in order to be accessible across audiences.*
760
+ -->
761
+
762
+ <!--
763
+ ## Model Card Authors
764
+
765
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
766
+ -->
767
+
768
+ <!--
769
+ ## Model Card Contact
770
+
771
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
772
+ -->
config.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "intfloat/multilingual-e5-small",
3
+ "architectures": [
4
+ "BertModel"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.1,
7
+ "classifier_dropout": null,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.1,
10
+ "hidden_size": 384,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 1536,
13
+ "layer_norm_eps": 1e-12,
14
+ "max_position_embeddings": 512,
15
+ "model_type": "bert",
16
+ "num_attention_heads": 12,
17
+ "num_hidden_layers": 12,
18
+ "pad_token_id": 0,
19
+ "position_embedding_type": "absolute",
20
+ "tokenizer_class": "XLMRobertaTokenizer",
21
+ "torch_dtype": "float32",
22
+ "transformers_version": "4.42.4",
23
+ "type_vocab_size": 2,
24
+ "use_cache": true,
25
+ "vocab_size": 250037
26
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.0.1",
4
+ "transformers": "4.42.4",
5
+ "pytorch": "2.3.1+cu121"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": null
10
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:66162adc8641bc5285be81e5eeeac12f6b0b2aa34eeb9f5488b05f73c17b0cf9
3
+ size 470637416
modules.json ADDED
@@ -0,0 +1,20 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ }
20
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 512,
3
+ "do_lower_case": false
4
+ }
sentencepiece.bpe.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:cfc8146abe2a0488e9e2a0c56de7952f7c11ab059eca145a0a727afce0db2865
3
+ size 5069051
special_tokens_map.json ADDED
@@ -0,0 +1,51 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "cls_token": {
10
+ "content": "<s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "eos_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "mask_token": {
24
+ "content": "<mask>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ },
30
+ "pad_token": {
31
+ "content": "<pad>",
32
+ "lstrip": false,
33
+ "normalized": false,
34
+ "rstrip": false,
35
+ "single_word": false
36
+ },
37
+ "sep_token": {
38
+ "content": "</s>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false
43
+ },
44
+ "unk_token": {
45
+ "content": "<unk>",
46
+ "lstrip": false,
47
+ "normalized": false,
48
+ "rstrip": false,
49
+ "single_word": false
50
+ }
51
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ef04f2b385d1514f500e779207ace0f53e30895ce37563179e29f4022d28ca38
3
+ size 17083053
tokenizer_config.json ADDED
@@ -0,0 +1,55 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "added_tokens_decoder": {
3
+ "0": {
4
+ "content": "<s>",
5
+ "lstrip": false,
6
+ "normalized": false,
7
+ "rstrip": false,
8
+ "single_word": false,
9
+ "special": true
10
+ },
11
+ "1": {
12
+ "content": "<pad>",
13
+ "lstrip": false,
14
+ "normalized": false,
15
+ "rstrip": false,
16
+ "single_word": false,
17
+ "special": true
18
+ },
19
+ "2": {
20
+ "content": "</s>",
21
+ "lstrip": false,
22
+ "normalized": false,
23
+ "rstrip": false,
24
+ "single_word": false,
25
+ "special": true
26
+ },
27
+ "3": {
28
+ "content": "<unk>",
29
+ "lstrip": false,
30
+ "normalized": false,
31
+ "rstrip": false,
32
+ "single_word": false,
33
+ "special": true
34
+ },
35
+ "250001": {
36
+ "content": "<mask>",
37
+ "lstrip": false,
38
+ "normalized": false,
39
+ "rstrip": false,
40
+ "single_word": false,
41
+ "special": true
42
+ }
43
+ },
44
+ "bos_token": "<s>",
45
+ "clean_up_tokenization_spaces": true,
46
+ "cls_token": "<s>",
47
+ "eos_token": "</s>",
48
+ "mask_token": "<mask>",
49
+ "model_max_length": 512,
50
+ "pad_token": "<pad>",
51
+ "sep_token": "</s>",
52
+ "sp_model_kwargs": {},
53
+ "tokenizer_class": "XLMRobertaTokenizer",
54
+ "unk_token": "<unk>"
55
+ }