KennethTM commited on
Commit
2647bbf
1 Parent(s): 6a09d98

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +46 -0
README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ pipeline_tag: text-classification
3
+ license: mit
4
+ datasets:
5
+ - squad
6
+ - eli5
7
+ - sentence-transformers/embedding-training-data
8
+ language:
9
+ - da
10
+ library_name: sentence-transformers
11
+ ---
12
+
13
+ # MiniLM-L6-danish-reranker
14
+
15
+ This is a lightweight (~22 M parameters) [sentence-transformers](https://www.SBERT.net) model for Danish NLP: It takes two sentences as input and outputs a relevance score. Therefore, the model can be used for information retrieval, e.g. given a query and candidate matches, rank the candidates by their relevance.
16
+
17
+ The maximum sequence length is 512 tokens (for both passages).
18
+
19
+ The model was not pre-trained from scratch but adapted from the English version of [cross-encoder/ms-marco-MiniLM-L-6-v2](https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2) with a [Danish tokenizer](https://huggingface.co/KennethTM/bert-base-uncased-danish).
20
+
21
+ Trained on ELI5 and SQUAD data machine translated from English to Danish.
22
+
23
+ ## Usage with Transformers
24
+
25
+ ```python
26
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
27
+ import torch
28
+
29
+ model = AutoModelForSequenceClassification.from_pretrained('KennethTM/MiniLM-L6-danish-reranker')
30
+ tokenizer = AutoTokenizer.from_pretrained('KennethTM/MiniLM-L6-danish-reranker')
31
+ features = tokenizer(['Kører der cykler på vejen?', 'Kører der cykler på vejen?'], ['En panda løber på vejen.', 'En mand kører hurtigt forbi på cykel.'], padding=True, truncation=True, return_tensors="pt")
32
+
33
+ model.eval()
34
+ with torch.no_grad():
35
+ scores = model(**features).logits
36
+ print(scores)
37
+ ```
38
+
39
+ ## Usage with SentenceTransformers
40
+
41
+ The usage becomes easier when you have [SentenceTransformers](https://www.sbert.net/) installed. Then, you can use the pre-trained models like this:
42
+ ```python
43
+ from sentence_transformers import CrossEncoder
44
+ model = CrossEncoder('KennethTM/MiniLM-L6-danish-reranker', max_length=512)
45
+ scores = model.predict([('Kører der cykler på vejen?', 'Kører der cykler på vejen?'), ('Kører der cykler på vejen?', 'En mand kører hurtigt forbi på cykel.')])
46
+ ```