Update README.md
Browse files
README.md
CHANGED
@@ -5,14 +5,24 @@ tags:
|
|
5 |
- feature-extraction
|
6 |
- sentence-similarity
|
7 |
- transformers
|
8 |
-
|
|
|
9 |
---
|
10 |
|
11 |
-
#
|
12 |
|
13 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
14 |
|
15 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
16 |
|
17 |
## Usage (Sentence-Transformers)
|
18 |
|
@@ -82,9 +92,9 @@ The model was trained with the parameters:
|
|
82 |
|
83 |
**DataLoader**:
|
84 |
|
85 |
-
`torch.utils.data.dataloader.DataLoader` of length
|
86 |
```
|
87 |
-
{'batch_size':
|
88 |
```
|
89 |
|
90 |
**Loss**:
|
@@ -120,4 +130,5 @@ SentenceTransformer(
|
|
120 |
|
121 |
## Citing & Authors
|
122 |
|
123 |
-
<!--- Describe where people can find more information -->
|
|
|
|
5 |
- feature-extraction
|
6 |
- sentence-similarity
|
7 |
- transformers
|
8 |
+
language:
|
9 |
+
- is
|
10 |
---
|
11 |
|
12 |
+
# Icelandic SBERT for Sentence Embedding
|
13 |
|
14 |
This is a [sentence-transformers](https://www.SBERT.net) model: It maps sentences & paragraphs to a 768 dimensional dense vector space and can be used for tasks like clustering or semantic search.
|
15 |
|
16 |
+
## Data
|
17 |
+
|
18 |
+
The model was trained on 600 000 sentences, selected at random from clarin-is: [unanotated news2 from IGC(RMH)](https://repository.clarin.is/repository/xmlui/handle/20.500.12537/238)
|
19 |
+
|
20 |
+
|
21 |
+
to install the data, run the following command:
|
22 |
+
|
23 |
+
```bash
|
24 |
+
curl --remote-name-all https://repository.clarin.is/repository/xmlui/bitstream/handle/20.500.12537/238{/IGC-News2-22.10.TEI.zip}
|
25 |
+
```
|
26 |
|
27 |
## Usage (Sentence-Transformers)
|
28 |
|
|
|
92 |
|
93 |
**DataLoader**:
|
94 |
|
95 |
+
`torch.utils.data.dataloader.DataLoader` of length 150000 with parameters:
|
96 |
```
|
97 |
+
{'batch_size': 2, 'sampler': 'torch.utils.data.sampler.RandomSampler', 'batch_sampler': 'torch.utils.data.sampler.BatchSampler'}
|
98 |
```
|
99 |
|
100 |
**Loss**:
|
|
|
130 |
|
131 |
## Citing & Authors
|
132 |
|
133 |
+
<!--- Describe where people can find more information -->
|
134 |
+
Sigurdur Haukur Birgisson
|