Upload folder using huggingface_hub
Browse files- README.md +151 -90
- config.json +1 -1
- model.safetensors +1 -1
README.md
CHANGED
@@ -4,72 +4,138 @@ tags:
|
|
4 |
- sentence-similarity
|
5 |
- feature-extraction
|
6 |
- generated_from_trainer
|
7 |
-
- dataset_size:
|
8 |
- loss:MultipleNegativesRankingLoss
|
9 |
-
base_model:
|
10 |
widget:
|
11 |
-
- source_sentence:
|
12 |
-
|
13 |
-
|
14 |
-
we attack. 00| After exhibiting negotiations between the Morenoite government
|
15 |
-
and organized crime, drug trafficking expert Anabel Hernández denounces being
|
16 |
-
harassed and receiving death threats. We demand to activate the protection protocol
|
17 |
-
for journalists
|
18 |
sentences:
|
19 |
-
-
|
20 |
-
-
|
21 |
-
|
22 |
-
|
23 |
-
|
24 |
-
|
25 |
-
|
26 |
-
|
27 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
28 |
sentences:
|
29 |
-
-
|
30 |
-
|
31 |
-
-
|
32 |
-
|
33 |
-
|
34 |
-
|
|
|
|
|
|
|
35 |
sentences:
|
36 |
-
-
|
37 |
-
|
38 |
-
-
|
39 |
-
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
|
|
|
|
|
|
|
|
44 |
sentences:
|
45 |
-
-
|
46 |
-
-
|
47 |
-
|
48 |
-
|
49 |
-
|
50 |
-
|
51 |
-
|
52 |
-
|
53 |
-
|
|
|
54 |
sentences:
|
55 |
-
-
|
56 |
-
|
57 |
-
-
|
58 |
-
- “If you want to know who controls you, just look at whom you cannot criticize,”
|
59 |
-
Voltaire said.
|
60 |
pipeline_tag: sentence-similarity
|
61 |
library_name: sentence-transformers
|
62 |
---
|
63 |
|
64 |
-
# SentenceTransformer based on
|
65 |
|
66 |
-
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [
|
67 |
|
68 |
## Model Details
|
69 |
|
70 |
### Model Description
|
71 |
- **Model Type:** Sentence Transformer
|
72 |
-
- **Base model:** [
|
73 |
- **Maximum Sequence Length:** 512 tokens
|
74 |
- **Output Dimensionality:** 1024 dimensions
|
75 |
- **Similarity Function:** Cosine Similarity
|
@@ -111,9 +177,9 @@ from sentence_transformers import SentenceTransformer
|
|
111 |
model = SentenceTransformer("sentence_transformers_model_id")
|
112 |
# Run inference
|
113 |
sentences = [
|
114 |
-
|
115 |
-
'
|
116 |
-
'
|
117 |
]
|
118 |
embeddings = model.encode(sentences)
|
119 |
print(embeddings.shape)
|
@@ -167,19 +233,19 @@ You can finetune this model on your own dataset.
|
|
167 |
|
168 |
#### Unnamed Dataset
|
169 |
|
170 |
-
* Size:
|
171 |
-
* Columns: <code>sentence_0</code
|
172 |
* Approximate statistics based on the first 1000 samples:
|
173 |
-
| | sentence_0
|
174 |
-
|
175 |
-
| type | string
|
176 |
-
| details | <ul><li>min: 2 tokens</li><li>mean:
|
177 |
* Samples:
|
178 |
-
| sentence_0
|
179 |
-
|
180 |
-
| <code>
|
181 |
-
| <code>
|
182 |
-
| <code>
|
183 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
184 |
```json
|
185 |
{
|
@@ -194,7 +260,6 @@ You can finetune this model on your own dataset.
|
|
194 |
- `per_device_train_batch_size`: 2
|
195 |
- `per_device_eval_batch_size`: 2
|
196 |
- `num_train_epochs`: 1
|
197 |
-
- `fp16`: True
|
198 |
- `multi_dataset_batch_sampler`: round_robin
|
199 |
|
200 |
#### All Hyperparameters
|
@@ -239,7 +304,7 @@ You can finetune this model on your own dataset.
|
|
239 |
- `jit_mode_eval`: False
|
240 |
- `use_ipex`: False
|
241 |
- `bf16`: False
|
242 |
-
- `fp16`:
|
243 |
- `fp16_opt_level`: O1
|
244 |
- `half_precision_backend`: auto
|
245 |
- `bf16_full_eval`: False
|
@@ -321,31 +386,27 @@ You can finetune this model on your own dataset.
|
|
321 |
### Training Logs
|
322 |
| Epoch | Step | Training Loss |
|
323 |
|:------:|:-----:|:-------------:|
|
324 |
-
| 0.
|
325 |
-
| 0.
|
326 |
-
| 0.
|
327 |
-
| 0.
|
328 |
-
| 0.
|
329 |
-
| 0.
|
330 |
-
| 0.
|
331 |
-
| 0.
|
332 |
-
| 0.
|
333 |
-
| 0.
|
334 |
-
| 0.
|
335 |
-
| 0.
|
336 |
-
| 0.
|
337 |
-
| 0.
|
338 |
-
| 0.
|
339 |
-
| 0.
|
340 |
-
| 0.
|
341 |
-
| 0.
|
342 |
-
| 0.
|
343 |
-
| 0.
|
344 |
-
| 0.
|
345 |
-
| 0.8546 | 11000 | 0.0084 |
|
346 |
-
| 0.8934 | 11500 | 0.0208 |
|
347 |
-
| 0.9323 | 12000 | 0.0052 |
|
348 |
-
| 0.9711 | 12500 | 0.0081 |
|
349 |
|
350 |
|
351 |
### Framework Versions
|
|
|
4 |
- sentence-similarity
|
5 |
- feature-extraction
|
6 |
- generated_from_trainer
|
7 |
+
- dataset_size:21988
|
8 |
- loss:MultipleNegativesRankingLoss
|
9 |
+
base_model: Lajavaness/bilingual-embedding-large
|
10 |
widget:
|
11 |
+
- source_sentence: BEAS INTESTINES 2901 718935 wwwIsrael under heavy attack from Gaza
|
12 |
+
There were more than 600 rockets launched against Israel. There are some civilians
|
13 |
+
wounded and dead
|
|
|
|
|
|
|
|
|
14 |
sentences:
|
15 |
+
- Photo shows cloud of smoke after attack in Israel
|
16 |
+
- Claudia López with a book thanking the FARC
|
17 |
+
- Wife of Chinese official shot in US
|
18 |
+
- source_sentence: 'People''s Network people.cn People''s Daily: Scientifically grasp
|
19 |
+
the law of population development Balanced Population Development in the New Era
|
20 |
+
- January 2022 From the 1st, the one-child policy will be completely abolished.
|
21 |
+
Newlyweds must have at least two children Wang Peian April 1, 2021 06:18 Source:
|
22 |
+
People''s Daily Online, People''s Daily Executive summary: ■After the founding
|
23 |
+
of New China, the implementation of family planning was based on the basic national
|
24 |
+
conditions of my country''s large population and relatively insufficient resources
|
25 |
+
A major strategic decision, which makes the population''s pressure on resources
|
26 |
+
and the environment get a preliminary understanding: it creates a longer demographic
|
27 |
+
dividend period, It has effectively promoted economic development, social progress
|
28 |
+
and the improvement of people''s living standards, and the country''s capacity
|
29 |
+
for sustainable development has been greatly enhanced. ■Since the beginning of
|
30 |
+
the new century, my country''s population situation has undergone major changes.
|
31 |
+
Strive to achieve the level of active fertility, vigorously improve the quality
|
32 |
+
and skills of workers, and implement the comprehensive two-child policy, which
|
33 |
+
is the key to population development. Three issues that must be addressed in the
|
34 |
+
field. ■ Attention should be paid to the research on population development strategies,
|
35 |
+
comprehensively and profoundly understand and grasp the laws of population, and
|
36 |
+
promote the coordination between population and economy and society. development,
|
37 |
+
and promote the long-term balanced development of the population. choice of history
|
38 |
+
my country has been a country with the largest population in the world since ancient
|
39 |
+
times. In traditional society, if there is an entrance, there will be a license
|
40 |
+
and tax, and the country will be strengthened. If there is a population, there
|
41 |
+
will be soldiers. The rulers of successive dynasties have vigorously encouraged
|
42 |
+
population reproduction. Once the society is stable and production develops, the
|
43 |
+
total population will decrease. The threshold will increase greatly; when the
|
44 |
+
dynasty is changed, the army will be in chaos, famine and flag epidemics will
|
45 |
+
be intertwined, and the population will be sharp or small. Look, before the 17th
|
46 |
+
century, my country''s population grew slowly in a cyclical ups and downs. The
|
47 |
+
introduction of high-yielding food crops such as corn, sweet potato and potato
|
48 |
+
in the late Ming Dynasty, especially the century-long Kanggan in the early Qing
|
49 |
+
Dynasty. The prosperous age made my country''s population grow rapidly, breaking
|
50 |
+
through the 200 million, 300 million mark successively, and the 400 million mark
|
51 |
+
in the Daoguang years, which led to Legal Migrant Workers People''s Network people.cn
|
52 |
+
People''s Daily: Scientifically grasp the law of population development Balanced
|
53 |
+
Population Development in the New Era - January 2022 From the 1st, the one-child
|
54 |
+
policy will be completely abolished. Newlyweds must have at least two children
|
55 |
+
Wang Peian April 1, 2021 06:18 Source: People''s Daily Online, People''s Daily
|
56 |
+
Executive summary: ■After the founding of New China, the implementation of family
|
57 |
+
planning was based on the basic national conditions of my country''s large population
|
58 |
+
and relatively insufficient resources A major strategic decision, which makes
|
59 |
+
the population''s pressure on resources and the environment get a preliminary
|
60 |
+
understanding: it creates a longer demographic dividend period, It has effectively
|
61 |
+
promoted economic development, social progress and the improvement of people''s
|
62 |
+
living standards, and the country''s capacity for sustainable development has
|
63 |
+
been greatly enhanced. ■Since the beginning of the new century, my country''s
|
64 |
+
population situation has undergone major changes. Strive to achieve the level
|
65 |
+
of active fertility, vigorously improve the quality and skills of workers, and
|
66 |
+
implement the comprehensive two-child policy, which is the key to population development.
|
67 |
+
Three issues that must be addressed in the field. ■ Attention should be paid to
|
68 |
+
the research on population development strategies, comprehensively and profoundly
|
69 |
+
understand and grasp the laws of population, and promote the coordination between
|
70 |
+
population and economy and society. development, and promote the long-term balanced
|
71 |
+
development of the population. choice of history my country has been a country
|
72 |
+
with the largest population in the world since ancient times. In traditional society,
|
73 |
+
if there is an entrance, there will be a license and tax, and the country will
|
74 |
+
be strengthened. If there is a population, there will be soldiers. The rulers
|
75 |
+
of successive dynasties have vigorously encouraged population reproduction. Once
|
76 |
+
the society is stable and production develops, the total population will decrease.
|
77 |
+
The threshold will increase greatly; when the dynasty is changed, the army will
|
78 |
+
be in chaos, famine and flag epidemics will be intertwined, and the population
|
79 |
+
will be sharp or small. Look, before the 17th century, my country''s population
|
80 |
+
grew slowly in a cyclical ups and downs. The introduction of high-yielding food
|
81 |
+
crops such as corn, sweet potato and potato in the late Ming Dynasty, especially
|
82 |
+
the century-long Kanggan in the early Qing Dynasty. The prosperous age made my
|
83 |
+
country''s population grow rapidly, breaking through the 200 million, 300 million
|
84 |
+
mark successively, and the 400 million mark in the Daoguang years, which led to Legal
|
85 |
+
Migrant WorkersA warning to those prosperous forces who often talk about human
|
86 |
+
rights: China has human rights, and we have approved that Chinese people must
|
87 |
+
get married, and they must have two children after they get married!'
|
88 |
sentences:
|
89 |
+
- Hamad bin Jassim told the BBC In a new interview, we paid the defected Syrian
|
90 |
+
officer $30,000 and the regular soldier $15,000.
|
91 |
+
- State-run newspaper announces Chinese couples ‘must have two children’ starting
|
92 |
+
January 2022
|
93 |
+
- This is the draw for judges for the case of former Ecuadorian President Rafael
|
94 |
+
Correa
|
95 |
+
- source_sentence: Part 1 Resignation sir jokowi JOKOWI REGISTERED COMPASS DKI DPRD
|
96 |
+
HOLDS Plenary MEETING CARIS JAKARTA KOMPASTV Tik TokIs it true that the President
|
97 |
+
of Indonesia, Joko Widodo, has resigned from his position?
|
98 |
sentences:
|
99 |
+
- BBC reports on release of 'Unabomber' Ted Kaczynski
|
100 |
+
- Thai children flash three fingered salute to Thai PM Prayut
|
101 |
+
- President Joko Widodo, alias Jokowi, resigns from his post
|
102 |
+
- source_sentence: The organization 'Vegan Society' calls for a ban on animal-shaped
|
103 |
+
children's cookies. They consider that these cookies "incite children to see animals
|
104 |
+
as something inferior and at our disposal." This is the , which is dangerous even
|
105 |
+
for anti-bullfighting. It's not that they don't want bullfighting. It is that
|
106 |
+
they want to impose even the shape of the cookies that your children eat. And
|
107 |
+
it's not the first time. Barnum cookies have already "freed" the animals in their
|
108 |
+
boxes to have a better brand image. They may seem like funny news. But they are
|
109 |
+
not. They hide a prohibitionist ideology full of censorship. 𝗘𝗹 𝗮𝗻𝗶𝗺����𝗹𝗶𝘀𝗺𝗼 𝗲𝘀
|
110 |
+
𝗽𝗲𝗹𝗶𝗴𝗿𝗼 𝗽𝗮𝗿𝗮 𝗻𝘂𝗲𝘀𝘁𝗿𝗮 𝘀𝗼𝗰𝗶𝗲𝗱𝗮𝗱
|
111 |
sentences:
|
112 |
+
- Vegan NGO Vegan Society wants to ban the sale of animal-shaped cookies in France
|
113 |
+
- Cans of food containing pork with a "halal" stamp
|
114 |
+
- Pfizer announces Covid-19 vaccine update with Microsoft chip for symptom reduction
|
115 |
+
- source_sentence: a . . . . . (177. FO Accident st THE LEADER IN ACCIDENT REPORTING
|
116 |
+
Reckless driving by a minor Kuliapitiya Kanadulla after a defender collided with
|
117 |
+
a motorcycle An accident occurred in front of Maha Vidyalaya today (01) afternoon
|
118 |
+
A young man on a motorcycle and about 4 years old A young child (father and son)
|
119 |
+
unfortunately Lost his life. Behaved provocatively with the accident Villagers
|
120 |
+
set fire to the defender car that caused the accident had May that innocent father
|
121 |
+
and little son rest in peace! 94 site
|
122 |
sentences:
|
123 |
+
- The image of a Syrian child who sleeps next to the graves of his parents
|
124 |
+
- Accident kills four-year-old in northwestern Sri Lanka
|
125 |
+
- Masks are ineffective because some packaging says they don't protect
|
|
|
|
|
126 |
pipeline_tag: sentence-similarity
|
127 |
library_name: sentence-transformers
|
128 |
---
|
129 |
|
130 |
+
# SentenceTransformer based on Lajavaness/bilingual-embedding-large
|
131 |
|
132 |
+
This is a [sentence-transformers](https://www.SBERT.net) model finetuned from [Lajavaness/bilingual-embedding-large](https://huggingface.co/Lajavaness/bilingual-embedding-large). It maps sentences & paragraphs to a 1024-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
|
133 |
|
134 |
## Model Details
|
135 |
|
136 |
### Model Description
|
137 |
- **Model Type:** Sentence Transformer
|
138 |
+
- **Base model:** [Lajavaness/bilingual-embedding-large](https://huggingface.co/Lajavaness/bilingual-embedding-large) <!-- at revision e83179d7a66e8aed1b3015e98bb5ae234ed89598 -->
|
139 |
- **Maximum Sequence Length:** 512 tokens
|
140 |
- **Output Dimensionality:** 1024 dimensions
|
141 |
- **Similarity Function:** Cosine Similarity
|
|
|
177 |
model = SentenceTransformer("sentence_transformers_model_id")
|
178 |
# Run inference
|
179 |
sentences = [
|
180 |
+
'a . . . . . (177. FO Accident st THE LEADER IN ACCIDENT REPORTING Reckless driving by a minor Kuliapitiya Kanadulla after a defender collided with a motorcycle An accident occurred in front of Maha Vidyalaya today (01) afternoon A young man on a motorcycle and about 4 years old A young child (father and son) unfortunately Lost his life. Behaved provocatively with the accident Villagers set fire to the defender car that caused the accident had May that innocent father and little son rest in peace! 94 site',
|
181 |
+
'Accident kills four-year-old in northwestern Sri Lanka',
|
182 |
+
'The image of a Syrian child who sleeps next to the graves of his parents',
|
183 |
]
|
184 |
embeddings = model.encode(sentences)
|
185 |
print(embeddings.shape)
|
|
|
233 |
|
234 |
#### Unnamed Dataset
|
235 |
|
236 |
+
* Size: 21,988 training samples
|
237 |
+
* Columns: <code>sentence_0</code> and <code>sentence_1</code>
|
238 |
* Approximate statistics based on the first 1000 samples:
|
239 |
+
| | sentence_0 | sentence_1 |
|
240 |
+
|:--------|:-----------------------------------------------------------------------------------|:-----------------------------------------------------------------------------------|
|
241 |
+
| type | string | string |
|
242 |
+
| details | <ul><li>min: 2 tokens</li><li>mean: 119.9 tokens</li><li>max: 512 tokens</li></ul> | <ul><li>min: 7 tokens</li><li>mean: 19.25 tokens</li><li>max: 128 tokens</li></ul> |
|
243 |
* Samples:
|
244 |
+
| sentence_0 | sentence_1 |
|
245 |
+
|:---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|:---------------------------------------------------------------------------------------------------------------|
|
246 |
+
| <code>ANK DBS DBS IT department at ChangiThis is actually happening as confirmed by my brother who does contract work with DBS at Changi Business Park. Wonder if PAP knows this or turning a blind eye and pretending not to know.</code> | <code>Photo shows foreign staff of the IT department at DBS Bank in Singapore</code> |
|
247 |
+
| <code>29th 30th 31st 32nd 33rd 34th 35th 36th 37th 38th 39th 40th 41st 42nd 43rd 44th 45th 46th 47th 48th 49th 50th 51st 52nd 53rd 54th 55th Urban Planning Foreign Languages Animal Science Law Economics Political Science Education Advertising Journalism Finance Hospitality Criminology Accounting Anthropology Psychology History Geography Information Technology Sociology Sports Science Social Sciences Real Estate Liberal Arts Communications and Mass Media Business Marketing Public Relations 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th 11th 12th 13th 14th 15th 16th 17th 18th 19th 20th 21st 22nd 23rd 24th 25th 26th 27th 28th Architecture Chemical Engineering Chemistry Electrical Engineering Physics Mechanical Engineering Civil Engineering Biochemistry Medicine Pharmacy Engineering Nursing Math Biology Philosophy Mathematics Statistics Music Microbiology Psychology Accounting Finance Environmental Science Creative Writing Hospitality International Relations Art History Ecology55 most difficult course...</code> | <code>Harvard list of its 50 most difficult courses</code> |
|
248 |
+
| <code>The 30,000 sheep donated by Mongolia to China entered through the Erenhot port, which is very spectacular. [Qiang] Yesterday there were people who were worried about how to transport so many sheep. It turned out that they came by themselves, and they didn't even need transport tools.</code> | <code>These videos show 30,000 sheep donated to China by Mongolia during the novel coronavirus epidemic</code> |
|
249 |
* Loss: [<code>MultipleNegativesRankingLoss</code>](https://sbert.net/docs/package_reference/sentence_transformer/losses.html#multiplenegativesrankingloss) with these parameters:
|
250 |
```json
|
251 |
{
|
|
|
260 |
- `per_device_train_batch_size`: 2
|
261 |
- `per_device_eval_batch_size`: 2
|
262 |
- `num_train_epochs`: 1
|
|
|
263 |
- `multi_dataset_batch_sampler`: round_robin
|
264 |
|
265 |
#### All Hyperparameters
|
|
|
304 |
- `jit_mode_eval`: False
|
305 |
- `use_ipex`: False
|
306 |
- `bf16`: False
|
307 |
+
- `fp16`: False
|
308 |
- `fp16_opt_level`: O1
|
309 |
- `half_precision_backend`: auto
|
310 |
- `bf16_full_eval`: False
|
|
|
386 |
### Training Logs
|
387 |
| Epoch | Step | Training Loss |
|
388 |
|:------:|:-----:|:-------------:|
|
389 |
+
| 0.0455 | 500 | 0.0505 |
|
390 |
+
| 0.0910 | 1000 | 0.0637 |
|
391 |
+
| 0.1364 | 1500 | 0.039 |
|
392 |
+
| 0.1819 | 2000 | 0.0269 |
|
393 |
+
| 0.2274 | 2500 | 0.0527 |
|
394 |
+
| 0.2729 | 3000 | 0.0576 |
|
395 |
+
| 0.3184 | 3500 | 0.0278 |
|
396 |
+
| 0.3638 | 4000 | 0.0471 |
|
397 |
+
| 0.4093 | 4500 | 0.0486 |
|
398 |
+
| 0.4548 | 5000 | 0.025 |
|
399 |
+
| 0.5003 | 5500 | 0.0324 |
|
400 |
+
| 0.5458 | 6000 | 0.0169 |
|
401 |
+
| 0.5912 | 6500 | 0.0218 |
|
402 |
+
| 0.6367 | 7000 | 0.0476 |
|
403 |
+
| 0.6822 | 7500 | 0.0124 |
|
404 |
+
| 0.7277 | 8000 | 0.0247 |
|
405 |
+
| 0.7731 | 8500 | 0.0231 |
|
406 |
+
| 0.8186 | 9000 | 0.01 |
|
407 |
+
| 0.8641 | 9500 | 0.0145 |
|
408 |
+
| 0.9096 | 10000 | 0.0267 |
|
409 |
+
| 0.9551 | 10500 | 0.0111 |
|
|
|
|
|
|
|
|
|
410 |
|
411 |
|
412 |
### Framework Versions
|
config.json
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
{
|
2 |
-
"_name_or_path": "
|
3 |
"architectures": [
|
4 |
"BilingualModel"
|
5 |
],
|
|
|
1 |
{
|
2 |
+
"_name_or_path": "Lajavaness/bilingual-embedding-large",
|
3 |
"architectures": [
|
4 |
"BilingualModel"
|
5 |
],
|
model.safetensors
CHANGED
@@ -1,3 +1,3 @@
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
-
oid sha256:
|
3 |
size 2239607176
|
|
|
1 |
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:544b5dea808c43e84c38838160a1d0090df6b1e8d839cebfb873915bcd19a15e
|
3 |
size 2239607176
|