AlexWortega commited on
Commit
f49662b
1 Parent(s): 257d42e

Add new SentenceTransformer model

Browse files
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ tokenizer.json filter=lfs diff=lfs merge=lfs -text
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 896,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": true
10
+ }
README.md ADDED
@@ -0,0 +1,831 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - sentence-transformers
4
+ - sentence-similarity
5
+ - feature-extraction
6
+ - generated_from_trainer
7
+ - dataset_size:1077240
8
+ - loss:MultipleNegativesRankingLoss
9
+ base_model: Qwen/Qwen2.5-0.5B-Instruct
10
+ widget:
11
+ - source_sentence: Who is the father of philosophy?
12
+ sentences:
13
+ - 'Charles Sanders Peirce
14
+
15
+ Charles Sanders Peirce (/pɜːrs/[9] "purse"; 10September 1839 – 19April 1914) was
16
+ an American philosopher, logician, mathematician, and scientist who is sometimes
17
+ known as "the father of pragmatism". He was educated as a chemist and employed
18
+ as a scientist for 30 years. Today he is appreciated largely for his contributions
19
+ to logic, mathematics, philosophy, scientific methodology, and semiotics, and
20
+ for his founding of pragmatism.'
21
+ - 'Georg Wilhelm Friedrich Hegel
22
+
23
+ According to Hegel, "Heraclitus is the one who first declared the nature of the
24
+ infinite and first grasped nature as in itself infinite, that is, its essence
25
+ as process. The origin of philosophy is to be dated from Heraclitus. His is the
26
+ persistent Idea that is the same in all philosophers up to the present day, as
27
+ it was the Idea of Plato and Aristotle". For Hegel, Heraclitus''s great achievements
28
+ were to have understood the nature of the infinite, which for Hegel includes understanding
29
+ the inherent contradictoriness and negativity of reality; and to have grasped
30
+ that reality is becoming or process and that "being" and "nothingness" are mere
31
+ empty abstractions. According to Hegel, Heraclitus''s "obscurity" comes from his
32
+ being a true (in Hegel''s terms "speculative") philosopher who grasped the ultimate
33
+ philosophical truth and therefore expressed himself in a way that goes beyond
34
+ the abstract and limited nature of common sense and is difficult to grasp by those
35
+ who operate within common sense. Hegel asserted that in Heraclitus he had an antecedent
36
+ for his logic: "[...] there is no proposition of Heraclitus which I have not adopted
37
+ in my logic".'
38
+ - 'History of nuclear weapons
39
+
40
+ The notion of using a fission weapon to ignite a process of nuclear fusion can
41
+ be dated back to 1942. At the first major theoretical conference on the development
42
+ of an atomic bomb hosted by J. Robert Oppenheimer at the University of California,
43
+ Berkeley, participant Edward Teller directed the majority of the discussion towards
44
+ Enrico Fermi''s idea of a "Super" bomb that would use the same reactions that
45
+ powered the Sun itself.'
46
+ - source_sentence: When was Father's Day first celebrated in America?
47
+ sentences:
48
+ - 'Father''s Day (United States)
49
+
50
+ Father''s Day was founded in Spokane, Washington at the YMCA in 1910 by Sonora
51
+ Smart Dodd, who was born in Arkansas.[4] Its first celebration was in the Spokane
52
+ YMCA on June 19, 1910.[4][5] Her father, the Civil War veteran William Jackson
53
+ Smart, was a single parent who raised his six children there.[4] After hearing
54
+ a sermon about Jarvis'' Mother''s Day at Central Methodist Episcopal Church in
55
+ 1909, she told her pastor that fathers should have a similar holiday honoring
56
+ them.[4][6] Although she initially suggested June 5, her father''s birthday, the
57
+ pastors did not have enough time to prepare their sermons, and the celebration
58
+ was deferred to the third Sunday of June.[7][8]'
59
+ - 'Father''s Day
60
+
61
+ In [[Peru]], Father''s Day is celebrated on the third Sunday of June and is not
62
+ a public holiday. People usually give a present to their fathers and spend time
63
+ with him mostly during a family meal.'
64
+ - 'Sacramento River
65
+
66
+ The Sacramento and its wide natural floodplain were once abundant in fish and
67
+ other aquatic creatures, notably one of the southernmost large runs of chinook
68
+ salmon in North America. For about 12,000 years, humans have depended on the vast
69
+ natural resources of the watershed, which had one of the densest Native American
70
+ populations in California. The river has provided a route for trade and travel
71
+ since ancient times. Hundreds of tribes sharing regional customs and traditions
72
+ inhabited the Sacramento Valley, first coming into contact with European explorers
73
+ in the late 1700s. The Spanish explorer Gabriel Moraga named the river Rio de
74
+ los Sacramentos in 1808, later shortened and anglicized into Sacramento.'
75
+ - source_sentence: What is the population of Austria in 2018?
76
+ sentences:
77
+ - 'Utah State Capitol
78
+
79
+ The Utah State Capitol is the house of government for the U.S. state of Utah.
80
+ The building houses the chambers and offices of the Utah State Legislature, the
81
+ offices of the Governor, Lieutenant Governor, Attorney General, the State Auditor
82
+ and their staffs. The capitol is the main building of the Utah State Capitol Complex,
83
+ which is located on Capitol Hill, overlooking downtown Salt Lake City.'
84
+ - 'Same-sex marriage in Austria
85
+
86
+ A September 2018 poll for "Österreich" found that 74% of Austrians supported same-sex
87
+ marriage and 26% were against.'
88
+ - 'Demographics of Austria
89
+
90
+ Population 8,793,370 (July 2018 est.) country comparison to the world: 96th'
91
+ - source_sentence: What language family is Malay?
92
+ sentences:
93
+ - 'Malay language
94
+
95
+ Malay is a member of the Austronesian family of languages, which includes languages
96
+ from Southeast Asia and the Pacific Ocean, with a smaller number in continental
97
+ Asia. Malagasy, a geographic outlier spoken in Madagascar in the Indian Ocean,
98
+ is also a member of this language family. Although each language of the family
99
+ is mutually unintelligible, their similarities are rather striking. Many roots
100
+ have come virtually unchanged from their common ancestor, Proto-Austronesian language.
101
+ There are many cognates found in the languages'' words for kinship, health, body
102
+ parts and common animals. Numbers, especially, show remarkable similarities.'
103
+ - 'Filipinos of Malay descent
104
+
105
+ In the Philippines, there is misconception and often mixing between the two definitions.
106
+ Filipinos consider Malays as being the natives of the Philippines, Indonesia,
107
+ Malaysia and Brunei. Consequently, Filipinos consider themselves Malay when in
108
+ reality, they are referring to the Malay Race. Filipinos in Singapore also prefer
109
+ to be considered Malay, but their desire to be labeled as part of the ethnic group
110
+ was rejected by the Singaporean government. Paradoxically, a minor percentage
111
+ of Filipinos prefer the Spanish influence and may associate themselves with being
112
+ Hispanic, and have made no realistic attempts to promote and/or revive the Malay
113
+ language in the Philippines.'
114
+ - 'Preferred provider organization
115
+
116
+ In health insurance in the United States, a preferred provider organization (PPO),
117
+ sometimes referred to as a participating provider organization or preferred provider
118
+ option, is a managed care organization of medical doctors, hospitals, and other
119
+ health care providers who have agreed with an insurer or a third-party administrator
120
+ to provide health care at reduced rates to the insurer''s or administrator''s
121
+ clients.'
122
+ - source_sentence: When was ABC formed?
123
+ sentences:
124
+ - 'American Broadcasting Company
125
+
126
+ ABC launched as a radio network on October 12, 1943, serving as the successor
127
+ to the NBC Blue Network, which had been purchased by Edward J. Noble. It extended
128
+ its operations to television in 1948, following in the footsteps of established
129
+ broadcast networks CBS and NBC. In the mid-1950s, ABC merged with United Paramount
130
+ Theatres, a chain of movie theaters that formerly operated as a subsidiary of
131
+ Paramount Pictures. Leonard Goldenson, who had been the head of UPT, made the
132
+ new television network profitable by helping develop and greenlight many successful
133
+ series. In the 1980s, after purchasing an 80% interest in cable sports channel
134
+ ESPN, the network''s corporate parent, American Broadcasting Companies, Inc.,
135
+ merged with Capital Cities Communications, owner of several print publications,
136
+ and television and radio stations. In 1996, most of Capital Cities/ABC''s assets
137
+ were purchased by The Walt Disney Company.'
138
+ - 'Roman concrete
139
+
140
+ Roman concrete, also called opus caementicium, was a material used in construction
141
+ during the late Roman Republic until the fading of the Roman Empire. Roman concrete
142
+ was based on a hydraulic-setting cement. Recently, it has been found that it materially
143
+ differs in several ways from modern concrete which is based on Portland cement.
144
+ Roman concrete is durable due to its incorporation of volcanic ash, which prevents
145
+ cracks from spreading. By the middle of the 1st century, the material was used
146
+ frequently, often brick-faced, although variations in aggregate allowed different
147
+ arrangements of materials. Further innovative developments in the material, called
148
+ the Concrete Revolution, contributed to structurally complicated forms, such as
149
+ the Pantheon dome, the world''s largest and oldest unreinforced concrete dome.[1]'
150
+ - 'Americans Battling Communism
151
+
152
+ Americans Battling Communism, Inc. (ABC) was an anti-communist organization created
153
+ following an October 1947 speech by Pennsylvania Judge Blair Gunther that called
154
+ for an "ABC movement" to educate America about communism. Chartered in November
155
+ 1947 by Harry Alan Sherman, a local lawyer active in various anti-communist organizations,
156
+ the group took part in such activities as blacklisting by disclosing the names
157
+ of people suspected of being communists. Its members included local judges and
158
+ lawyers active in the McCarthy-era prosecution of communists.'
159
+ pipeline_tag: sentence-similarity
160
+ library_name: sentence-transformers
161
+ metrics:
162
+ - pearson_cosine
163
+ - spearman_cosine
164
+ model-index:
165
+ - name: SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct
166
+ results:
167
+ - task:
168
+ type: semantic-similarity
169
+ name: Semantic Similarity
170
+ dataset:
171
+ name: sts dev 896
172
+ type: sts-dev-896
173
+ metrics:
174
+ - type: pearson_cosine
175
+ value: 0.7512795462804751
176
+ name: Pearson Cosine
177
+ - type: spearman_cosine
178
+ value: 0.7602862030369626
179
+ name: Spearman Cosine
180
+ - task:
181
+ type: semantic-similarity
182
+ name: Semantic Similarity
183
+ dataset:
184
+ name: sts dev 768
185
+ type: sts-dev-768
186
+ metrics:
187
+ - type: pearson_cosine
188
+ value: 0.7504358517848402
189
+ name: Pearson Cosine
190
+ - type: spearman_cosine
191
+ value: 0.7590404004512833
192
+ name: Spearman Cosine
193
+ ---
194
+
195
+ # SentenceTransformer based on Qwen/Qwen2.5-0.5B-Instruct
196
+
197
+ This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct). It maps sentences & paragraphs to a 896-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
198
+
199
+ ## Model Details
200
+
201
+ ### Model Description
202
+ - **Model Type:** Sentence Transformer
203
+ - **Base model:** [Qwen/Qwen2.5-0.5B-Instruct](https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct) <!-- at revision 7ae557604adf67be50417f59c2c2f167def9a775 -->
204
+ - **Maximum Sequence Length:** 1024 tokens
205
+ - **Output Dimensionality:** 896 dimensions
206
+ - **Similarity Function:** Cosine Similarity
207
+ <!-- - **Training Dataset:** Unknown -->
208
+ <!-- - **Language:** Unknown -->
209
+ <!-- - **License:** Unknown -->
210
+
211
+ ### Model Sources
212
+
213
+ - **Documentation:** [Sentence Transformers Documentation](https://sbert.net)
214
+ - **Repository:** [Sentence Transformers on GitHub](https://github.com/UKPLab/sentence-transformers)
215
+ - **Hugging Face:** [Sentence Transformers on Hugging Face](https://huggingface.co/models?library=sentence-transformers)
216
+
217
+ ### Full Model Architecture
218
+
219
+ ```
220
+ SentenceTransformer(
221
+ (0): Transformer({'max_seq_length': 1024, 'do_lower_case': False}) with Transformer model: Qwen2Model
222
+ (1): Pooling({'word_embedding_dimension': 896, 'pooling_mode_cls_token': False, 'pooling_mode_mean_tokens': True, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
223
+ )
224
+ ```
225
+
226
+ ## Usage
227
+
228
+ ### Direct Usage (Sentence Transformers)
229
+
230
+ First install the Sentence Transformers library:
231
+
232
+ ```bash
233
+ pip install -U sentence-transformers
234
+ ```
235
+
236
+ Then you can load this model and run inference.
237
+ ```python
238
+ from sentence_transformers import SentenceTransformer
239
+
240
+ # Download from the 🤗 Hub
241
+ model = SentenceTransformer("AlexWortega/qwen3k")
242
+ # Run inference
243
+ sentences = [
244
+ 'When was ABC formed?',
245
+ "American Broadcasting Company\nABC launched as a radio network on October 12, 1943, serving as the successor to the NBC Blue Network, which had been purchased by Edward J. Noble. It extended its operations to television in 1948, following in the footsteps of established broadcast networks CBS and NBC. In the mid-1950s, ABC merged with United Paramount Theatres, a chain of movie theaters that formerly operated as a subsidiary of Paramount Pictures. Leonard Goldenson, who had been the head of UPT, made the new television network profitable by helping develop and greenlight many successful series. In the 1980s, after purchasing an 80% interest in cable sports channel ESPN, the network's corporate parent, American Broadcasting Companies, Inc., merged with Capital Cities Communications, owner of several print publications, and television and radio stations. In 1996, most of Capital Cities/ABC's assets were purchased by The Walt Disney Company.",
246
+ 'Americans Battling Communism\nAmericans Battling Communism, Inc. (ABC) was an anti-communist organization created following an October 1947 speech by Pennsylvania Judge Blair Gunther that called for an "ABC movement" to educate America about communism. Chartered in November 1947 by Harry Alan Sherman, a local lawyer active in various anti-communist organizations, the group took part in such activities as blacklisting by disclosing the names of people suspected of being communists. Its members included local judges and lawyers active in the McCarthy-era prosecution of communists.',
247
+ ]
248
+ embeddings = model.encode(sentences)
249
+ print(embeddings.shape)
250
+ # [3, 896]
251
+
252
+ # Get the similarity scores for the embeddings
253
+ similarities = model.similarity(embeddings, embeddings)
254
+ print(similarities.shape)
255
+ # [3, 3]
256
+ ```
257
+
258
+ <!--
259
+ ### Direct Usage (Transformers)
260
+
261
+ <details><summary>Click to see the direct usage in Transformers</summary>
262
+
263
+ </details>
264
+ -->
265
+
266
+ <!--
267
+ ### Downstream Usage (Sentence Transformers)
268
+
269
+ You can finetune this model on your own dataset.
270
+
271
+ <details><summary>Click to expand</summary>
272
+
273
+ </details>
274
+ -->
275
+
276
+ <!--
277
+ ### Out-of-Scope Use
278
+
279
+ *List how the model may foreseeably be misused and address what users ought not to do with the model.*
280
+ -->
281
+
282
+ ## Evaluation
283
+
284
+ ### Metrics
285
+
286
+ #### Semantic Similarity
287
+
288
+ * Datasets: `sts-dev-896` and `sts-dev-768`
289
+ * Evaluated with [<code>EmbeddingSimilarityEvaluator</code>](https://sbert.net/docs/package_reference/sentence_transformer/evaluation.html#sentence_transformers.evaluation.EmbeddingSimilarityEvaluator)
290
+
291
+ | Metric | sts-dev-896 | sts-dev-768 |
292
+ |:--------------------|:------------|:------------|
293
+ | pearson_cosine | 0.7513 | 0.7504 |
294
+ | **spearman_cosine** | **0.7603** | **0.759** |
295
+
296
+ <!--
297
+ ## Bias, Risks and Limitations
298
+
299
+ *What are the known or foreseeable issues stemming from this model? You could also flag here known failure cases or weaknesses of the model.*
300
+ -->
301
+
302
+ <!--
303
+ ### Recommendations
304
+
305
+ *What are recommendations with respect to the foreseeable issues? For example, filtering explicit content.*
306
+ -->
307
+
308
+ ## Training Details
309
+
310
+ ### Training Dataset
311
+
312
+ #### Unnamed Dataset
313
+
314
+
315
+ * Size: 1,077,240 training samples
316
+ * Columns: <code>query</code>, <code>response</code>, and <code>negative</code>
317
+ * Approximate statistics based on the first 1000 samples:
318
+ | | query | response | negative |
319
+ |:--------|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------|:------------------------------------------------------------------------------------|
320
+ | type | string | string | string |
321
+ | details | <ul><li>min: 4 tokens</li><li>mean: 8.76 tokens</li><li>max: 26 tokens</li></ul> | <ul><li>min: 23 tokens</li><li>mean: 141.88 tokens</li><li>max: 532 tokens</li></ul> | <ul><li>min: 4 tokens</li><li>mean: 134.02 tokens</li><li>max: 472 tokens</li></ul> |
322
+ * Samples:
323
+ | query | response | negative |
324
+ |:--------------------------------------------------|||
325
+ | <code>Was there a year 0?</code> | <code>Year zero<br>Year zero does not exist in the anno Domini system usually used to number years in the Gregorian calendar and in its predecessor, the Julian calendar. In this system, the year 1 BC is followed by AD 1. However, there is a year zero in astronomical year numbering (where it coincides with the Julian year 1 BC) and in ISO 8601:2004 (where it coincides with the Gregorian year 1 BC) as well as in all Buddhist and Hindu calendars.</code> | <code>504<br>Year 504 (DIV) was a leap year starting on Thursday (link will display the full calendar) of the Julian calendar. At the time, it was known as the Year of the Consulship of Nicomachus without colleague (or, less frequently, year 1257 "Ab urbe condita"). The denomination 504 for this year has been used since the early medieval period, when the Anno Domini calendar era became the prevalent method in Europe for naming years.</code> |
326
+ | <code>When is the dialectical method used?</code> | <code>Dialectic<br>Dialectic or dialectics (Greek: διαλεκτική, dialektikḗ; related to dialogue), also known as the dialectical method, is at base a discourse between two or more people holding different points of view about a subject but wishing to establish the truth through reasoned arguments. Dialectic resembles debate, but the concept excludes subjective elements such as emotional appeal and the modern pejorative sense of rhetoric.[1][2] Dialectic may be contrasted with the didactic method, wherein one side of the conversation teaches the other. Dialectic is alternatively known as minor logic, as opposed to major logic or critique.</code> | <code>Derek Bentley case<br>Another factor in the posthumous defence was that a "confession" recorded by Bentley, which was claimed by the prosecution to be a "verbatim record of dictated monologue", was shown by forensic linguistics methods to have been largely edited by policemen. Linguist Malcolm Coulthard showed that certain patterns, such as the frequency of the word "then" and the grammatical use of "then" after the grammatical subject ("I then" rather than "then I"), were not consistent with Bentley's use of language (his idiolect), as evidenced in court testimony. These patterns fit better the recorded testimony of the policemen involved. This is one of the earliest uses of forensic linguistics on record.</code> |
327
+ | <code>What do Grasshoppers eat?</code> | <code>Grasshopper<br>Grasshoppers are plant-eaters, with a few species at times becoming serious pests of cereals, vegetables and pasture, especially when they swarm in their millions as locusts and destroy crops over wide areas. They protect themselves from predators by camouflage; when detected, many species attempt to startle the predator with a brilliantly-coloured wing-flash while jumping and (if adult) launching themselves into the air, usually flying for only a short distance. Other species such as the rainbow grasshopper have warning coloration which deters predators. Grasshoppers are affected by parasites and various diseases, and many predatory creatures feed on both nymphs and adults. The eggs are the subject of attack by parasitoids and predators.</code> | <code>Groundhog<br>Very often the dens of groundhogs provide homes for other animals including skunks, red foxes, and cottontail rabbits. The fox and skunk feed upon field mice, grasshoppers, beetles and other creatures that destroy farm crops. In aiding these animals, the groundhog indirectly helps the farmer. In addition to providing homes for itself and other animals, the groundhog aids in soil improvement by bringing subsoil to the surface. The groundhog is also a valuable game animal and is considered a difficult sport when hunted in a fair manner. In some parts of Appalachia, they are eaten.</code> |
328
+ * Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
329
+ ```json
330
+ {
331
+ "scale": 20.0,
332
+ "similarity_fct": "cos_sim"
333
+ }
334
+ ```
335
+
336
+ ### Training Hyperparameters
337
+ #### Non-Default Hyperparameters
338
+
339
+ - `eval_strategy`: steps
340
+ - `per_device_train_batch_size`: 12
341
+ - `per_device_eval_batch_size`: 12
342
+ - `gradient_accumulation_steps`: 4
343
+ - `num_train_epochs`: 1
344
+ - `warmup_ratio`: 0.3
345
+ - `bf16`: True
346
+ - `batch_sampler`: no_duplicates
347
+
348
+ #### All Hyperparameters
349
+ <details><summary>Click to expand</summary>
350
+
351
+ - `overwrite_output_dir`: False
352
+ - `do_predict`: False
353
+ - `eval_strategy`: steps
354
+ - `prediction_loss_only`: True
355
+ - `per_device_train_batch_size`: 12
356
+ - `per_device_eval_batch_size`: 12
357
+ - `per_gpu_train_batch_size`: None
358
+ - `per_gpu_eval_batch_size`: None
359
+ - `gradient_accumulation_steps`: 4
360
+ - `eval_accumulation_steps`: None
361
+ - `torch_empty_cache_steps`: None
362
+ - `learning_rate`: 5e-05
363
+ - `weight_decay`: 0.0
364
+ - `adam_beta1`: 0.9
365
+ - `adam_beta2`: 0.999
366
+ - `adam_epsilon`: 1e-08
367
+ - `max_grad_norm`: 1.0
368
+ - `num_train_epochs`: 1
369
+ - `max_steps`: -1
370
+ - `lr_scheduler_type`: linear
371
+ - `lr_scheduler_kwargs`: {}
372
+ - `warmup_ratio`: 0.3
373
+ - `warmup_steps`: 0
374
+ - `log_level`: passive
375
+ - `log_level_replica`: warning
376
+ - `log_on_each_node`: True
377
+ - `logging_nan_inf_filter`: True
378
+ - `save_safetensors`: True
379
+ - `save_on_each_node`: False
380
+ - `save_only_model`: False
381
+ - `restore_callback_states_from_checkpoint`: False
382
+ - `no_cuda`: False
383
+ - `use_cpu`: False
384
+ - `use_mps_device`: False
385
+ - `seed`: 42
386
+ - `data_seed`: None
387
+ - `jit_mode_eval`: False
388
+ - `use_ipex`: False
389
+ - `bf16`: True
390
+ - `fp16`: False
391
+ - `fp16_opt_level`: O1
392
+ - `half_precision_backend`: auto
393
+ - `bf16_full_eval`: False
394
+ - `fp16_full_eval`: False
395
+ - `tf32`: None
396
+ - `local_rank`: 0
397
+ - `ddp_backend`: None
398
+ - `tpu_num_cores`: None
399
+ - `tpu_metrics_debug`: False
400
+ - `debug`: []
401
+ - `dataloader_drop_last`: False
402
+ - `dataloader_num_workers`: 0
403
+ - `dataloader_prefetch_factor`: None
404
+ - `past_index`: -1
405
+ - `disable_tqdm`: False
406
+ - `remove_unused_columns`: True
407
+ - `label_names`: None
408
+ - `load_best_model_at_end`: False
409
+ - `ignore_data_skip`: False
410
+ - `fsdp`: []
411
+ - `fsdp_min_num_params`: 0
412
+ - `fsdp_config`: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
413
+ - `fsdp_transformer_layer_cls_to_wrap`: None
414
+ - `accelerator_config`: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}
415
+ - `deepspeed`: None
416
+ - `label_smoothing_factor`: 0.0
417
+ - `optim`: adamw_torch
418
+ - `optim_args`: None
419
+ - `adafactor`: False
420
+ - `group_by_length`: False
421
+ - `length_column_name`: length
422
+ - `ddp_find_unused_parameters`: None
423
+ - `ddp_bucket_cap_mb`: None
424
+ - `ddp_broadcast_buffers`: False
425
+ - `dataloader_pin_memory`: True
426
+ - `dataloader_persistent_workers`: False
427
+ - `skip_memory_metrics`: True
428
+ - `use_legacy_prediction_loop`: False
429
+ - `push_to_hub`: False
430
+ - `resume_from_checkpoint`: None
431
+ - `hub_model_id`: None
432
+ - `hub_strategy`: every_save
433
+ - `hub_private_repo`: False
434
+ - `hub_always_push`: False
435
+ - `gradient_checkpointing`: False
436
+ - `gradient_checkpointing_kwargs`: None
437
+ - `include_inputs_for_metrics`: False
438
+ - `include_for_metrics`: []
439
+ - `eval_do_concat_batches`: True
440
+ - `fp16_backend`: auto
441
+ - `push_to_hub_model_id`: None
442
+ - `push_to_hub_organization`: None
443
+ - `mp_parameters`:
444
+ - `auto_find_batch_size`: False
445
+ - `full_determinism`: False
446
+ - `torchdynamo`: None
447
+ - `ray_scope`: last
448
+ - `ddp_timeout`: 1800
449
+ - `torch_compile`: False
450
+ - `torch_compile_backend`: None
451
+ - `torch_compile_mode`: None
452
+ - `dispatch_batches`: None
453
+ - `split_batches`: None
454
+ - `include_tokens_per_second`: False
455
+ - `include_num_input_tokens_seen`: False
456
+ - `neftune_noise_alpha`: None
457
+ - `optim_target_modules`: None
458
+ - `batch_eval_metrics`: False
459
+ - `eval_on_start`: False
460
+ - `use_liger_kernel`: False
461
+ - `eval_use_gather_object`: False
462
+ - `average_tokens_across_devices`: False
463
+ - `prompts`: None
464
+ - `batch_sampler`: no_duplicates
465
+ - `multi_dataset_batch_sampler`: proportional
466
+
467
+ </details>
468
+
469
+ ### Training Logs
470
+ <details><summary>Click to expand</summary>
471
+
472
+ | Epoch | Step | Training Loss | sts-dev-896_spearman_cosine | sts-dev-768_spearman_cosine |
473
+ |:------:|:----:|:-------------:|:---------------------------:|:---------------------------:|
474
+ | 0.0004 | 10 | 2.2049 | - | - |
475
+ | 0.0009 | 20 | 2.3168 | - | - |
476
+ | 0.0013 | 30 | 2.3544 | - | - |
477
+ | 0.0018 | 40 | 2.2519 | - | - |
478
+ | 0.0022 | 50 | 2.1809 | - | - |
479
+ | 0.0027 | 60 | 2.1572 | - | - |
480
+ | 0.0031 | 70 | 2.1855 | - | - |
481
+ | 0.0036 | 80 | 2.5887 | - | - |
482
+ | 0.0040 | 90 | 2.883 | - | - |
483
+ | 0.0045 | 100 | 2.8557 | - | - |
484
+ | 0.0049 | 110 | 2.9356 | - | - |
485
+ | 0.0053 | 120 | 2.8833 | - | - |
486
+ | 0.0058 | 130 | 2.8394 | - | - |
487
+ | 0.0062 | 140 | 2.923 | - | - |
488
+ | 0.0067 | 150 | 2.8191 | - | - |
489
+ | 0.0071 | 160 | 2.8658 | - | - |
490
+ | 0.0076 | 170 | 2.8252 | - | - |
491
+ | 0.0080 | 180 | 2.8312 | - | - |
492
+ | 0.0085 | 190 | 2.7761 | - | - |
493
+ | 0.0089 | 200 | 2.7193 | - | - |
494
+ | 0.0094 | 210 | 2.724 | - | - |
495
+ | 0.0098 | 220 | 2.7484 | - | - |
496
+ | 0.0102 | 230 | 2.7262 | - | - |
497
+ | 0.0107 | 240 | 2.6964 | - | - |
498
+ | 0.0111 | 250 | 2.6676 | - | - |
499
+ | 0.0116 | 260 | 2.6715 | - | - |
500
+ | 0.0120 | 270 | 2.6145 | - | - |
501
+ | 0.0125 | 280 | 2.6191 | - | - |
502
+ | 0.0129 | 290 | 1.9812 | - | - |
503
+ | 0.0134 | 300 | 1.6413 | - | - |
504
+ | 0.0138 | 310 | 1.6126 | - | - |
505
+ | 0.0143 | 320 | 1.3599 | - | - |
506
+ | 0.0147 | 330 | 1.2996 | - | - |
507
+ | 0.0151 | 340 | 1.2654 | - | - |
508
+ | 0.0156 | 350 | 1.9409 | - | - |
509
+ | 0.0160 | 360 | 2.1287 | - | - |
510
+ | 0.0165 | 370 | 1.8442 | - | - |
511
+ | 0.0169 | 380 | 1.6837 | - | - |
512
+ | 0.0174 | 390 | 1.5489 | - | - |
513
+ | 0.0178 | 400 | 1.4382 | - | - |
514
+ | 0.0183 | 410 | 1.4848 | - | - |
515
+ | 0.0187 | 420 | 1.3481 | - | - |
516
+ | 0.0192 | 430 | 1.3467 | - | - |
517
+ | 0.0196 | 440 | 1.3977 | - | - |
518
+ | 0.0201 | 450 | 1.26 | - | - |
519
+ | 0.0205 | 460 | 1.2412 | - | - |
520
+ | 0.0209 | 470 | 1.316 | - | - |
521
+ | 0.0214 | 480 | 1.3501 | - | - |
522
+ | 0.0218 | 490 | 1.2246 | - | - |
523
+ | 0.0223 | 500 | 1.2271 | - | - |
524
+ | 0.0227 | 510 | 1.1871 | - | - |
525
+ | 0.0232 | 520 | 1.1685 | - | - |
526
+ | 0.0236 | 530 | 1.1624 | - | - |
527
+ | 0.0241 | 540 | 1.1911 | - | - |
528
+ | 0.0245 | 550 | 1.1978 | - | - |
529
+ | 0.0250 | 560 | 1.1228 | - | - |
530
+ | 0.0254 | 570 | 1.1091 | - | - |
531
+ | 0.0258 | 580 | 1.1433 | - | - |
532
+ | 0.0263 | 590 | 1.0638 | - | - |
533
+ | 0.0267 | 600 | 1.0515 | - | - |
534
+ | 0.0272 | 610 | 1.175 | - | - |
535
+ | 0.0276 | 620 | 1.0943 | - | - |
536
+ | 0.0281 | 630 | 1.1226 | - | - |
537
+ | 0.0285 | 640 | 0.9871 | - | - |
538
+ | 0.0290 | 650 | 1.0171 | - | - |
539
+ | 0.0294 | 660 | 1.0169 | - | - |
540
+ | 0.0299 | 670 | 0.9643 | - | - |
541
+ | 0.0303 | 680 | 0.9563 | - | - |
542
+ | 0.0307 | 690 | 0.9841 | - | - |
543
+ | 0.0312 | 700 | 1.0349 | - | - |
544
+ | 0.0316 | 710 | 0.8958 | - | - |
545
+ | 0.0321 | 720 | 0.9225 | - | - |
546
+ | 0.0325 | 730 | 0.842 | - | - |
547
+ | 0.0330 | 740 | 0.9104 | - | - |
548
+ | 0.0334 | 750 | 0.8927 | - | - |
549
+ | 0.0339 | 760 | 0.8508 | - | - |
550
+ | 0.0343 | 770 | 0.8835 | - | - |
551
+ | 0.0348 | 780 | 0.9531 | - | - |
552
+ | 0.0352 | 790 | 0.926 | - | - |
553
+ | 0.0356 | 800 | 0.8718 | - | - |
554
+ | 0.0361 | 810 | 0.8261 | - | - |
555
+ | 0.0365 | 820 | 0.8169 | - | - |
556
+ | 0.0370 | 830 | 0.8525 | - | - |
557
+ | 0.0374 | 840 | 0.8504 | - | - |
558
+ | 0.0379 | 850 | 0.7625 | - | - |
559
+ | 0.0383 | 860 | 0.8259 | - | - |
560
+ | 0.0388 | 870 | 0.7558 | - | - |
561
+ | 0.0392 | 880 | 0.7898 | - | - |
562
+ | 0.0397 | 890 | 0.7694 | - | - |
563
+ | 0.0401 | 900 | 0.7429 | - | - |
564
+ | 0.0405 | 910 | 0.6666 | - | - |
565
+ | 0.0410 | 920 | 0.7407 | - | - |
566
+ | 0.0414 | 930 | 0.6665 | - | - |
567
+ | 0.0419 | 940 | 0.7597 | - | - |
568
+ | 0.0423 | 950 | 0.7035 | - | - |
569
+ | 0.0428 | 960 | 0.7166 | - | - |
570
+ | 0.0432 | 970 | 0.6889 | - | - |
571
+ | 0.0437 | 980 | 0.7541 | - | - |
572
+ | 0.0441 | 990 | 0.7175 | - | - |
573
+ | 0.0446 | 1000 | 0.7389 | 0.6420 | 0.6403 |
574
+ | 0.0450 | 1010 | 0.7142 | - | - |
575
+ | 0.0454 | 1020 | 0.7301 | - | - |
576
+ | 0.0459 | 1030 | 0.7299 | - | - |
577
+ | 0.0463 | 1040 | 0.6759 | - | - |
578
+ | 0.0468 | 1050 | 0.7036 | - | - |
579
+ | 0.0472 | 1060 | 0.6286 | - | - |
580
+ | 0.0477 | 1070 | 0.595 | - | - |
581
+ | 0.0481 | 1080 | 0.6099 | - | - |
582
+ | 0.0486 | 1090 | 0.6377 | - | - |
583
+ | 0.0490 | 1100 | 0.6309 | - | - |
584
+ | 0.0495 | 1110 | 0.6306 | - | - |
585
+ | 0.0499 | 1120 | 0.557 | - | - |
586
+ | 0.0504 | 1130 | 0.5898 | - | - |
587
+ | 0.0508 | 1140 | 0.5896 | - | - |
588
+ | 0.0512 | 1150 | 0.6399 | - | - |
589
+ | 0.0517 | 1160 | 0.5923 | - | - |
590
+ | 0.0521 | 1170 | 0.5787 | - | - |
591
+ | 0.0526 | 1180 | 0.591 | - | - |
592
+ | 0.0530 | 1190 | 0.5714 | - | - |
593
+ | 0.0535 | 1200 | 0.6047 | - | - |
594
+ | 0.0539 | 1210 | 0.5904 | - | - |
595
+ | 0.0544 | 1220 | 0.543 | - | - |
596
+ | 0.0548 | 1230 | 0.6033 | - | - |
597
+ | 0.0553 | 1240 | 0.5445 | - | - |
598
+ | 0.0557 | 1250 | 0.5217 | - | - |
599
+ | 0.0561 | 1260 | 0.5835 | - | - |
600
+ | 0.0566 | 1270 | 0.5353 | - | - |
601
+ | 0.0570 | 1280 | 0.5887 | - | - |
602
+ | 0.0575 | 1290 | 0.5967 | - | - |
603
+ | 0.0579 | 1300 | 0.5036 | - | - |
604
+ | 0.0584 | 1310 | 0.5915 | - | - |
605
+ | 0.0588 | 1320 | 0.5719 | - | - |
606
+ | 0.0593 | 1330 | 0.5238 | - | - |
607
+ | 0.0597 | 1340 | 0.5647 | - | - |
608
+ | 0.0602 | 1350 | 0.538 | - | - |
609
+ | 0.0606 | 1360 | 0.5457 | - | - |
610
+ | 0.0610 | 1370 | 0.5169 | - | - |
611
+ | 0.0615 | 1380 | 0.4967 | - | - |
612
+ | 0.0619 | 1390 | 0.4864 | - | - |
613
+ | 0.0624 | 1400 | 0.5133 | - | - |
614
+ | 0.0628 | 1410 | 0.5587 | - | - |
615
+ | 0.0633 | 1420 | 0.4691 | - | - |
616
+ | 0.0637 | 1430 | 0.5186 | - | - |
617
+ | 0.0642 | 1440 | 0.4907 | - | - |
618
+ | 0.0646 | 1450 | 0.5281 | - | - |
619
+ | 0.0651 | 1460 | 0.4741 | - | - |
620
+ | 0.0655 | 1470 | 0.4452 | - | - |
621
+ | 0.0659 | 1480 | 0.4771 | - | - |
622
+ | 0.0664 | 1490 | 0.4289 | - | - |
623
+ | 0.0668 | 1500 | 0.4551 | - | - |
624
+ | 0.0673 | 1510 | 0.4558 | - | - |
625
+ | 0.0677 | 1520 | 0.5159 | - | - |
626
+ | 0.0682 | 1530 | 0.4296 | - | - |
627
+ | 0.0686 | 1540 | 0.4548 | - | - |
628
+ | 0.0691 | 1550 | 0.4439 | - | - |
629
+ | 0.0695 | 1560 | 0.4295 | - | - |
630
+ | 0.0700 | 1570 | 0.4466 | - | - |
631
+ | 0.0704 | 1580 | 0.4717 | - | - |
632
+ | 0.0708 | 1590 | 0.492 | - | - |
633
+ | 0.0713 | 1600 | 0.4566 | - | - |
634
+ | 0.0717 | 1610 | 0.4451 | - | - |
635
+ | 0.0722 | 1620 | 0.4715 | - | - |
636
+ | 0.0726 | 1630 | 0.4573 | - | - |
637
+ | 0.0731 | 1640 | 0.3972 | - | - |
638
+ | 0.0735 | 1650 | 0.5212 | - | - |
639
+ | 0.0740 | 1660 | 0.4381 | - | - |
640
+ | 0.0744 | 1670 | 0.4552 | - | - |
641
+ | 0.0749 | 1680 | 0.4767 | - | - |
642
+ | 0.0753 | 1690 | 0.4398 | - | - |
643
+ | 0.0757 | 1700 | 0.4801 | - | - |
644
+ | 0.0762 | 1710 | 0.3751 | - | - |
645
+ | 0.0766 | 1720 | 0.4407 | - | - |
646
+ | 0.0771 | 1730 | 0.4305 | - | - |
647
+ | 0.0775 | 1740 | 0.3938 | - | - |
648
+ | 0.0780 | 1750 | 0.4748 | - | - |
649
+ | 0.0784 | 1760 | 0.428 | - | - |
650
+ | 0.0789 | 1770 | 0.404 | - | - |
651
+ | 0.0793 | 1780 | 0.4261 | - | - |
652
+ | 0.0798 | 1790 | 0.359 | - | - |
653
+ | 0.0802 | 1800 | 0.4422 | - | - |
654
+ | 0.0807 | 1810 | 0.4748 | - | - |
655
+ | 0.0811 | 1820 | 0.4352 | - | - |
656
+ | 0.0815 | 1830 | 0.4032 | - | - |
657
+ | 0.0820 | 1840 | 0.4124 | - | - |
658
+ | 0.0824 | 1850 | 0.4486 | - | - |
659
+ | 0.0829 | 1860 | 0.429 | - | - |
660
+ | 0.0833 | 1870 | 0.4189 | - | - |
661
+ | 0.0838 | 1880 | 0.3658 | - | - |
662
+ | 0.0842 | 1890 | 0.4297 | - | - |
663
+ | 0.0847 | 1900 | 0.4215 | - | - |
664
+ | 0.0851 | 1910 | 0.3726 | - | - |
665
+ | 0.0856 | 1920 | 0.3736 | - | - |
666
+ | 0.0860 | 1930 | 0.4287 | - | - |
667
+ | 0.0864 | 1940 | 0.4402 | - | - |
668
+ | 0.0869 | 1950 | 0.4353 | - | - |
669
+ | 0.0873 | 1960 | 0.3622 | - | - |
670
+ | 0.0878 | 1970 | 0.3557 | - | - |
671
+ | 0.0882 | 1980 | 0.4107 | - | - |
672
+ | 0.0887 | 1990 | 0.3982 | - | - |
673
+ | 0.0891 | 2000 | 0.453 | 0.7292 | 0.7261 |
674
+ | 0.0896 | 2010 | 0.3971 | - | - |
675
+ | 0.0900 | 2020 | 0.4374 | - | - |
676
+ | 0.0905 | 2030 | 0.4322 | - | - |
677
+ | 0.0909 | 2040 | 0.3945 | - | - |
678
+ | 0.0913 | 2050 | 0.356 | - | - |
679
+ | 0.0918 | 2060 | 0.4182 | - | - |
680
+ | 0.0922 | 2070 | 0.3694 | - | - |
681
+ | 0.0927 | 2080 | 0.3989 | - | - |
682
+ | 0.0931 | 2090 | 0.4237 | - | - |
683
+ | 0.0936 | 2100 | 0.3961 | - | - |
684
+ | 0.0940 | 2110 | 0.4264 | - | - |
685
+ | 0.0945 | 2120 | 0.3609 | - | - |
686
+ | 0.0949 | 2130 | 0.4154 | - | - |
687
+ | 0.0954 | 2140 | 0.3661 | - | - |
688
+ | 0.0958 | 2150 | 0.3328 | - | - |
689
+ | 0.0962 | 2160 | 0.3456 | - | - |
690
+ | 0.0967 | 2170 | 0.3478 | - | - |
691
+ | 0.0971 | 2180 | 0.3339 | - | - |
692
+ | 0.0976 | 2190 | 0.3833 | - | - |
693
+ | 0.0980 | 2200 | 0.3238 | - | - |
694
+ | 0.0985 | 2210 | 0.3871 | - | - |
695
+ | 0.0989 | 2220 | 0.4009 | - | - |
696
+ | 0.0994 | 2230 | 0.4115 | - | - |
697
+ | 0.0998 | 2240 | 0.4024 | - | - |
698
+ | 0.1003 | 2250 | 0.35 | - | - |
699
+ | 0.1007 | 2260 | 0.3649 | - | - |
700
+ | 0.1011 | 2270 | 0.3615 | - | - |
701
+ | 0.1016 | 2280 | 0.3898 | - | - |
702
+ | 0.1020 | 2290 | 0.3866 | - | - |
703
+ | 0.1025 | 2300 | 0.3904 | - | - |
704
+ | 0.1029 | 2310 | 0.3321 | - | - |
705
+ | 0.1034 | 2320 | 0.3803 | - | - |
706
+ | 0.1038 | 2330 | 0.3831 | - | - |
707
+ | 0.1043 | 2340 | 0.403 | - | - |
708
+ | 0.1047 | 2350 | 0.3803 | - | - |
709
+ | 0.1052 | 2360 | 0.3463 | - | - |
710
+ | 0.1056 | 2370 | 0.3987 | - | - |
711
+ | 0.1060 | 2380 | 0.3731 | - | - |
712
+ | 0.1065 | 2390 | 0.353 | - | - |
713
+ | 0.1069 | 2400 | 0.3166 | - | - |
714
+ | 0.1074 | 2410 | 0.3895 | - | - |
715
+ | 0.1078 | 2420 | 0.4025 | - | - |
716
+ | 0.1083 | 2430 | 0.3798 | - | - |
717
+ | 0.1087 | 2440 | 0.2991 | - | - |
718
+ | 0.1092 | 2450 | 0.3094 | - | - |
719
+ | 0.1096 | 2460 | 0.3669 | - | - |
720
+ | 0.1101 | 2470 | 0.3412 | - | - |
721
+ | 0.1105 | 2480 | 0.3697 | - | - |
722
+ | 0.1110 | 2490 | 0.369 | - | - |
723
+ | 0.1114 | 2500 | 0.3393 | - | - |
724
+ | 0.1118 | 2510 | 0.4232 | - | - |
725
+ | 0.1123 | 2520 | 0.3445 | - | - |
726
+ | 0.1127 | 2530 | 0.4165 | - | - |
727
+ | 0.1132 | 2540 | 0.3721 | - | - |
728
+ | 0.1136 | 2550 | 0.3476 | - | - |
729
+ | 0.1141 | 2560 | 0.2847 | - | - |
730
+ | 0.1145 | 2570 | 0.3609 | - | - |
731
+ | 0.1150 | 2580 | 0.3017 | - | - |
732
+ | 0.1154 | 2590 | 0.374 | - | - |
733
+ | 0.1159 | 2600 | 0.3365 | - | - |
734
+ | 0.1163 | 2610 | 0.393 | - | - |
735
+ | 0.1167 | 2620 | 0.3623 | - | - |
736
+ | 0.1172 | 2630 | 0.3538 | - | - |
737
+ | 0.1176 | 2640 | 0.3206 | - | - |
738
+ | 0.1181 | 2650 | 0.3962 | - | - |
739
+ | 0.1185 | 2660 | 0.3087 | - | - |
740
+ | 0.1190 | 2670 | 0.3482 | - | - |
741
+ | 0.1194 | 2680 | 0.3616 | - | - |
742
+ | 0.1199 | 2690 | 0.3955 | - | - |
743
+ | 0.1203 | 2700 | 0.3915 | - | - |
744
+ | 0.1208 | 2710 | 0.3782 | - | - |
745
+ | 0.1212 | 2720 | 0.3576 | - | - |
746
+ | 0.1216 | 2730 | 0.3544 | - | - |
747
+ | 0.1221 | 2740 | 0.3572 | - | - |
748
+ | 0.1225 | 2750 | 0.3107 | - | - |
749
+ | 0.1230 | 2760 | 0.3579 | - | - |
750
+ | 0.1234 | 2770 | 0.3571 | - | - |
751
+ | 0.1239 | 2780 | 0.3694 | - | - |
752
+ | 0.1243 | 2790 | 0.3674 | - | - |
753
+ | 0.1248 | 2800 | 0.3373 | - | - |
754
+ | 0.1252 | 2810 | 0.3362 | - | - |
755
+ | 0.1257 | 2820 | 0.3225 | - | - |
756
+ | 0.1261 | 2830 | 0.3609 | - | - |
757
+ | 0.1265 | 2840 | 0.3681 | - | - |
758
+ | 0.1270 | 2850 | 0.4059 | - | - |
759
+ | 0.1274 | 2860 | 0.3047 | - | - |
760
+ | 0.1279 | 2870 | 0.3446 | - | - |
761
+ | 0.1283 | 2880 | 0.3507 | - | - |
762
+ | 0.1288 | 2890 | 0.3124 | - | - |
763
+ | 0.1292 | 2900 | 0.3712 | - | - |
764
+ | 0.1297 | 2910 | 0.3394 | - | - |
765
+ | 0.1301 | 2920 | 0.3869 | - | - |
766
+ | 0.1306 | 2930 | 0.3449 | - | - |
767
+ | 0.1310 | 2940 | 0.3752 | - | - |
768
+ | 0.1314 | 2950 | 0.3341 | - | - |
769
+ | 0.1319 | 2960 | 0.3329 | - | - |
770
+ | 0.1323 | 2970 | 0.36 | - | - |
771
+ | 0.1328 | 2980 | 0.3788 | - | - |
772
+ | 0.1332 | 2990 | 0.3834 | - | - |
773
+ | 0.1337 | 3000 | 0.3426 | 0.7603 | 0.7590 |
774
+
775
+ </details>
776
+
777
+ ### Framework Versions
778
+ - Python: 3.10.12
779
+ - Sentence Transformers: 3.3.0
780
+ - Transformers: 4.46.2
781
+ - PyTorch: 2.1.0+cu118
782
+ - Accelerate: 1.1.1
783
+ - Datasets: 3.1.0
784
+ - Tokenizers: 0.20.3
785
+
786
+ ## Citation
787
+
788
+ ### BibTeX
789
+
790
+ #### Sentence Transformers
791
+ ```bibtex
792
+ @inproceedings{reimers-2019-sentence-bert,
793
+ title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
794
+ author = "Reimers, Nils and Gurevych, Iryna",
795
+ booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
796
+ month = "11",
797
+ year = "2019",
798
+ publisher = "Association for Computational Linguistics",
799
+ url = "https://arxiv.org/abs/1908.10084",
800
+ }
801
+ ```
802
+
803
+ #### MultipleNegativesRankingLoss
804
+ ```bibtex
805
+ @misc{henderson2017efficient,
806
+ title={Efficient Natural Language Response Suggestion for Smart Reply},
807
+ author={Matthew Henderson and Rami Al-Rfou and Brian Strope and Yun-hsuan Sung and Laszlo Lukacs and Ruiqi Guo and Sanjiv Kumar and Balint Miklos and Ray Kurzweil},
808
+ year={2017},
809
+ eprint={1705.00652},
810
+ archivePrefix={arXiv},
811
+ primaryClass={cs.CL}
812
+ }
813
+ ```
814
+
815
+ <!--
816
+ ## Glossary
817
+
818
+ *Clearly define terms in order to be accessible across audiences.*
819
+ -->
820
+
821
+ <!--
822
+ ## Model Card Authors
823
+
824
+ *Lists the people who create the model card, providing recognition and accountability for the detailed work that goes into its construction.*
825
+ -->
826
+
827
+ <!--
828
+ ## Model Card Contact
829
+
830
+ *Provides a way for people who have updates to the Model Card, suggestions, or questions, to contact the Model Card authors.*
831
+ -->
added_tokens.json ADDED
@@ -0,0 +1,24 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "</tool_call>": 151658,
3
+ "<tool_call>": 151657,
4
+ "<|box_end|>": 151649,
5
+ "<|box_start|>": 151648,
6
+ "<|endoftext|>": 151643,
7
+ "<|file_sep|>": 151664,
8
+ "<|fim_middle|>": 151660,
9
+ "<|fim_pad|>": 151662,
10
+ "<|fim_prefix|>": 151659,
11
+ "<|fim_suffix|>": 151661,
12
+ "<|im_end|>": 151645,
13
+ "<|im_start|>": 151644,
14
+ "<|image_pad|>": 151655,
15
+ "<|object_ref_end|>": 151647,
16
+ "<|object_ref_start|>": 151646,
17
+ "<|quad_end|>": 151651,
18
+ "<|quad_start|>": 151650,
19
+ "<|repo_name|>": 151663,
20
+ "<|video_pad|>": 151656,
21
+ "<|vision_end|>": 151653,
22
+ "<|vision_pad|>": 151654,
23
+ "<|vision_start|>": 151652
24
+ }
config.json ADDED
@@ -0,0 +1,29 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "output/matryoshka_nli_Qwen-Qwen2.5-0.5B-Instruct-2024-11-15_16-44-10/checkpoint-3000",
3
+ "architectures": [
4
+ "Qwen2Model"
5
+ ],
6
+ "attention_dropout": 0.0,
7
+ "bos_token_id": 151643,
8
+ "eos_token_id": 151645,
9
+ "hidden_act": "silu",
10
+ "hidden_size": 896,
11
+ "initializer_range": 0.02,
12
+ "intermediate_size": 4864,
13
+ "max_position_embeddings": 32768,
14
+ "max_window_layers": 21,
15
+ "model_type": "qwen2",
16
+ "num_attention_heads": 14,
17
+ "num_hidden_layers": 24,
18
+ "num_key_value_heads": 2,
19
+ "rms_norm_eps": 1e-06,
20
+ "rope_scaling": null,
21
+ "rope_theta": 1000000.0,
22
+ "sliding_window": null,
23
+ "tie_word_embeddings": true,
24
+ "torch_dtype": "float32",
25
+ "transformers_version": "4.46.2",
26
+ "use_cache": true,
27
+ "use_sliding_window": false,
28
+ "vocab_size": 151936
29
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "3.3.0",
4
+ "transformers": "4.46.2",
5
+ "pytorch": "2.1.0+cu118"
6
+ },
7
+ "prompts": {},
8
+ "default_prompt_name": null,
9
+ "similarity_fn_name": "cosine"
10
+ }
merges.txt ADDED
The diff for this file is too large to render. See raw diff
 
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:8c0682f53f2c2392fa306c79c1558d1f217a4941c4ca3dfa57ef9defab03f4fd
3
+ size 1976161736
modules.json ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ }
14
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 1024,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "additional_special_tokens": [
3
+ "<|im_start|>",
4
+ "<|im_end|>",
5
+ "<|object_ref_start|>",
6
+ "<|object_ref_end|>",
7
+ "<|box_start|>",
8
+ "<|box_end|>",
9
+ "<|quad_start|>",
10
+ "<|quad_end|>",
11
+ "<|vision_start|>",
12
+ "<|vision_end|>",
13
+ "<|vision_pad|>",
14
+ "<|image_pad|>",
15
+ "<|video_pad|>"
16
+ ],
17
+ "eos_token": {
18
+ "content": "<|im_end|>",
19
+ "lstrip": false,
20
+ "normalized": false,
21
+ "rstrip": false,
22
+ "single_word": false
23
+ },
24
+ "pad_token": {
25
+ "content": "<|endoftext|>",
26
+ "lstrip": false,
27
+ "normalized": false,
28
+ "rstrip": false,
29
+ "single_word": false
30
+ }
31
+ }
tokenizer.json ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:08e9a56ecf7013f6cbd7c201447cfb8e0f78e53b28ba234afa66a8904c817250
3
+ size 11422163
tokenizer_config.json ADDED
@@ -0,0 +1,214 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": false,
3
+ "add_prefix_space": false,
4
+ "added_tokens_decoder": {
5
+ "151643": {
6
+ "content": "<|endoftext|>",
7
+ "lstrip": false,
8
+ "normalized": false,
9
+ "rstrip": false,
10
+ "single_word": false,
11
+ "special": true
12
+ },
13
+ "151644": {
14
+ "content": "<|im_start|>",
15
+ "lstrip": false,
16
+ "normalized": false,
17
+ "rstrip": false,
18
+ "single_word": false,
19
+ "special": true
20
+ },
21
+ "151645": {
22
+ "content": "<|im_end|>",
23
+ "lstrip": false,
24
+ "normalized": false,
25
+ "rstrip": false,
26
+ "single_word": false,
27
+ "special": true
28
+ },
29
+ "151646": {
30
+ "content": "<|object_ref_start|>",
31
+ "lstrip": false,
32
+ "normalized": false,
33
+ "rstrip": false,
34
+ "single_word": false,
35
+ "special": true
36
+ },
37
+ "151647": {
38
+ "content": "<|object_ref_end|>",
39
+ "lstrip": false,
40
+ "normalized": false,
41
+ "rstrip": false,
42
+ "single_word": false,
43
+ "special": true
44
+ },
45
+ "151648": {
46
+ "content": "<|box_start|>",
47
+ "lstrip": false,
48
+ "normalized": false,
49
+ "rstrip": false,
50
+ "single_word": false,
51
+ "special": true
52
+ },
53
+ "151649": {
54
+ "content": "<|box_end|>",
55
+ "lstrip": false,
56
+ "normalized": false,
57
+ "rstrip": false,
58
+ "single_word": false,
59
+ "special": true
60
+ },
61
+ "151650": {
62
+ "content": "<|quad_start|>",
63
+ "lstrip": false,
64
+ "normalized": false,
65
+ "rstrip": false,
66
+ "single_word": false,
67
+ "special": true
68
+ },
69
+ "151651": {
70
+ "content": "<|quad_end|>",
71
+ "lstrip": false,
72
+ "normalized": false,
73
+ "rstrip": false,
74
+ "single_word": false,
75
+ "special": true
76
+ },
77
+ "151652": {
78
+ "content": "<|vision_start|>",
79
+ "lstrip": false,
80
+ "normalized": false,
81
+ "rstrip": false,
82
+ "single_word": false,
83
+ "special": true
84
+ },
85
+ "151653": {
86
+ "content": "<|vision_end|>",
87
+ "lstrip": false,
88
+ "normalized": false,
89
+ "rstrip": false,
90
+ "single_word": false,
91
+ "special": true
92
+ },
93
+ "151654": {
94
+ "content": "<|vision_pad|>",
95
+ "lstrip": false,
96
+ "normalized": false,
97
+ "rstrip": false,
98
+ "single_word": false,
99
+ "special": true
100
+ },
101
+ "151655": {
102
+ "content": "<|image_pad|>",
103
+ "lstrip": false,
104
+ "normalized": false,
105
+ "rstrip": false,
106
+ "single_word": false,
107
+ "special": true
108
+ },
109
+ "151656": {
110
+ "content": "<|video_pad|>",
111
+ "lstrip": false,
112
+ "normalized": false,
113
+ "rstrip": false,
114
+ "single_word": false,
115
+ "special": true
116
+ },
117
+ "151657": {
118
+ "content": "<tool_call>",
119
+ "lstrip": false,
120
+ "normalized": false,
121
+ "rstrip": false,
122
+ "single_word": false,
123
+ "special": false
124
+ },
125
+ "151658": {
126
+ "content": "</tool_call>",
127
+ "lstrip": false,
128
+ "normalized": false,
129
+ "rstrip": false,
130
+ "single_word": false,
131
+ "special": false
132
+ },
133
+ "151659": {
134
+ "content": "<|fim_prefix|>",
135
+ "lstrip": false,
136
+ "normalized": false,
137
+ "rstrip": false,
138
+ "single_word": false,
139
+ "special": false
140
+ },
141
+ "151660": {
142
+ "content": "<|fim_middle|>",
143
+ "lstrip": false,
144
+ "normalized": false,
145
+ "rstrip": false,
146
+ "single_word": false,
147
+ "special": false
148
+ },
149
+ "151661": {
150
+ "content": "<|fim_suffix|>",
151
+ "lstrip": false,
152
+ "normalized": false,
153
+ "rstrip": false,
154
+ "single_word": false,
155
+ "special": false
156
+ },
157
+ "151662": {
158
+ "content": "<|fim_pad|>",
159
+ "lstrip": false,
160
+ "normalized": false,
161
+ "rstrip": false,
162
+ "single_word": false,
163
+ "special": false
164
+ },
165
+ "151663": {
166
+ "content": "<|repo_name|>",
167
+ "lstrip": false,
168
+ "normalized": false,
169
+ "rstrip": false,
170
+ "single_word": false,
171
+ "special": false
172
+ },
173
+ "151664": {
174
+ "content": "<|file_sep|>",
175
+ "lstrip": false,
176
+ "normalized": false,
177
+ "rstrip": false,
178
+ "single_word": false,
179
+ "special": false
180
+ }
181
+ },
182
+ "additional_special_tokens": [
183
+ "<|im_start|>",
184
+ "<|im_end|>",
185
+ "<|object_ref_start|>",
186
+ "<|object_ref_end|>",
187
+ "<|box_start|>",
188
+ "<|box_end|>",
189
+ "<|quad_start|>",
190
+ "<|quad_end|>",
191
+ "<|vision_start|>",
192
+ "<|vision_end|>",
193
+ "<|vision_pad|>",
194
+ "<|image_pad|>",
195
+ "<|video_pad|>"
196
+ ],
197
+ "bos_token": null,
198
+ "chat_template": "{%- if tools %}\n {{- '<|im_start|>system\\n' }}\n {%- if messages[0]['role'] == 'system' %}\n {{- messages[0]['content'] }}\n {%- else %}\n {{- 'You are Qwen, created by Alibaba Cloud. You are a helpful assistant.' }}\n {%- endif %}\n {{- \"\\n\\n# Tools\\n\\nYou may call one or more functions to assist with the user query.\\n\\nYou are provided with function signatures within <tools></tools> XML tags:\\n<tools>\" }}\n {%- for tool in tools %}\n {{- \"\\n\" }}\n {{- tool | tojson }}\n {%- endfor %}\n {{- \"\\n</tools>\\n\\nFor each function call, return a json object with function name and arguments within <tool_call></tool_call> XML tags:\\n<tool_call>\\n{\\\"name\\\": <function-name>, \\\"arguments\\\": <args-json-object>}\\n</tool_call><|im_end|>\\n\" }}\n{%- else %}\n {%- if messages[0]['role'] == 'system' %}\n {{- '<|im_start|>system\\n' + messages[0]['content'] + '<|im_end|>\\n' }}\n {%- else %}\n {{- '<|im_start|>system\\nYou are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>\\n' }}\n {%- endif %}\n{%- endif %}\n{%- for message in messages %}\n {%- if (message.role == \"user\") or (message.role == \"system\" and not loop.first) or (message.role == \"assistant\" and not message.tool_calls) %}\n {{- '<|im_start|>' + message.role + '\\n' + message.content + '<|im_end|>' + '\\n' }}\n {%- elif message.role == \"assistant\" %}\n {{- '<|im_start|>' + message.role }}\n {%- if message.content %}\n {{- '\\n' + message.content }}\n {%- endif %}\n {%- for tool_call in message.tool_calls %}\n {%- if tool_call.function is defined %}\n {%- set tool_call = tool_call.function %}\n {%- endif %}\n {{- '\\n<tool_call>\\n{\"name\": \"' }}\n {{- tool_call.name }}\n {{- '\", \"arguments\": ' }}\n {{- tool_call.arguments | tojson }}\n {{- '}\\n</tool_call>' }}\n {%- endfor %}\n {{- '<|im_end|>\\n' }}\n {%- elif message.role == \"tool\" %}\n {%- if (loop.index0 == 0) or (messages[loop.index0 - 1].role != \"tool\") %}\n {{- '<|im_start|>user' }}\n {%- endif %}\n {{- '\\n<tool_response>\\n' }}\n {{- message.content }}\n {{- '\\n</tool_response>' }}\n {%- if loop.last or (messages[loop.index0 + 1].role != \"tool\") %}\n {{- '<|im_end|>\\n' }}\n {%- endif %}\n {%- endif %}\n{%- endfor %}\n{%- if add_generation_prompt %}\n {{- '<|im_start|>assistant\\n' }}\n{%- endif %}\n",
199
+ "clean_up_tokenization_spaces": false,
200
+ "eos_token": "<|im_end|>",
201
+ "errors": "replace",
202
+ "max_length": 1024,
203
+ "model_max_length": 1024,
204
+ "pad_to_multiple_of": null,
205
+ "pad_token": "<|endoftext|>",
206
+ "pad_token_type_id": 0,
207
+ "padding_side": "right",
208
+ "split_special_tokens": false,
209
+ "stride": 0,
210
+ "tokenizer_class": "Qwen2Tokenizer",
211
+ "truncation_side": "right",
212
+ "truncation_strategy": "longest_first",
213
+ "unk_token": null
214
+ }
vocab.json ADDED
The diff for this file is too large to render. See raw diff