PhilipMay commited on
Commit
a3db480
·
1 Parent(s): c81d9d0

add train parameters

Browse files
Files changed (1) hide show
  1. README.md +21 -2
README.md CHANGED
@@ -18,12 +18,31 @@ This is a [sentence-transformers](https://www.SBERT.net) model:
18
  It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space.
19
  The model is intended to be used together with [SetFit](https://github.com/huggingface/setfit)
20
  to improve German few-shot text classification.
 
 
 
21
 
22
  This model is based on [deepset/gbert-large](https://huggingface.co/deepset/gbert-large).
23
  Many thanks to [deepset](https://www.deepset.ai/)!
24
 
25
- ## Training
26
- TODO
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
27
 
28
  ## Evaluation Results
29
  We use the [NLU Few-shot Benchmark - English and German](https://huggingface.co/datasets/deutsche-telekom/NLU-few-shot-benchmark-en-de)
 
18
  It maps sentences & paragraphs (text) into a 1024 dimensional dense vector space.
19
  The model is intended to be used together with [SetFit](https://github.com/huggingface/setfit)
20
  to improve German few-shot text classification.
21
+ It has has a sibling model called
22
+ [deutsche-telekom/gbert-large-paraphrase-euclidean](https://huggingface.co/deutsche-telekom/gbert-large-paraphrase-euclidean).
23
+
24
 
25
  This model is based on [deepset/gbert-large](https://huggingface.co/deepset/gbert-large).
26
  Many thanks to [deepset](https://www.deepset.ai/)!
27
 
28
+ **Loss Function**\
29
+ We have used [MultipleNegativesRankingLoss](https://www.sbert.net/docs/package_reference/losses.html#multiplenegativesrankingloss) with cosine similarity as the loss function:
30
+
31
+ **Training Data**\
32
+ The model is trained on a carefully filtered dataset of
33
+ [deutsche-telekom/ger-backtrans-paraphrase](https://huggingface.co/datasets/deutsche-telekom/ger-backtrans-paraphrase).
34
+ We deleted the following pairs of sentences:
35
+ - `min_char_len` less than 15
36
+ - `jaccard_similarity` greater than 0.3
37
+ - `de_token_count` greater than 30
38
+ - `en_de_token_count` greater than 30
39
+ - `cos_sim` less than 0.85
40
+
41
+ **Hyperparameters**
42
+ - learning_rate: 8.345726930229726e-06
43
+ - num_epochs: 7
44
+ - train_batch_size: 57
45
+ - num_gpu: ???
46
 
47
  ## Evaluation Results
48
  We use the [NLU Few-shot Benchmark - English and German](https://huggingface.co/datasets/deutsche-telekom/NLU-few-shot-benchmark-en-de)