Fill-Mask
Transformers
PyTorch
Japanese
bert
Inference Endpoints
aken12 commited on
Commit
e58d212
1 Parent(s): b3ec1e2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -1
README.md CHANGED
@@ -7,6 +7,24 @@ language:
7
  - ja
8
  ---
9
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  ## Evaluation on [MIRACL japanese](https://huggingface.co/datasets/miracl/miracl)
11
  These models don't train on the MIRACL training data.
12
 
@@ -22,6 +40,8 @@ These models don't train on the MIRACL training data.
22
 
23
  *'splade-japanese-v2-doc' model does not require query encoder during inference.
24
 
 
 
25
  下のコードを実行すれば,単語拡張や重み付けの確認ができます.
26
 
27
  If you'd like to try it out, you can see the expansion of queries or documents by running the code below.
@@ -41,7 +61,7 @@ model = AutoModelForMaskedLM.from_pretrained("aken12/splade-japanese-v3")
41
  tokenizer = AutoTokenizer.from_pretrained("aken12/splade-japanese-v3")
42
  vocab_dict = {v: k for k, v in tokenizer.get_vocab().items()}
43
 
44
- def encode_query(query):
45
  query = tokenizer(query, return_tensors="pt")
46
  output = model(**query, return_dict=True).logits
47
  output, _ = torch.max(torch.log(1 + torch.relu(output)) * query['attention_mask'].unsqueeze(-1), dim=1)
 
7
  - ja
8
  ---
9
 
10
+
11
+
12
+ | | | | JQaRa | | |
13
+ | ------------------- | --- | --------- | --------- | --------- | --------- |
14
+ | | | NDCG@10 | MRR@10 | NDCG@100 | MRR@100 |
15
+ | splade-japanese-v3 | | 0.505 | 0.772 | 0.7 | 0.775 |
16
+ | JaColBERTv2 | | 0.585 | 0.836 | 0.753 | 0.838 |
17
+ | JaColBERT | | 0.549 | 0.811 | 0.730 | 0.814 |
18
+ | bge-m3+all | | 0.576 | 0.818 | 0.745 | 0.820 |
19
+ | bg3-m3+dense | | 0.539 | 0.785 | 0.721 | 0.788 |
20
+ | m-e5-large | | 0.554 | 0.799 | 0.731 | 0.801 |
21
+ | m-e5-base | | 0.471 | 0.727 | 0.673 | 0.731 |
22
+ | m-e5-small | | 0.492 | 0.729 | 0.689 | 0.733 |
23
+ | GLuCoSE | | 0.308 | 0.518 | 0.564 | 0.527 |
24
+ | sup-simcse-ja-base | | 0.324 | 0.541 | 0.572 | 0.550 |
25
+ | sup-simcse-ja-large | | 0.356 | 0.575 | 0.596 | 0.583 |
26
+ | fio-base-v0.1 | | 0.372 | 0.616 | 0.608 | 0.622 |
27
+
28
  ## Evaluation on [MIRACL japanese](https://huggingface.co/datasets/miracl/miracl)
29
  These models don't train on the MIRACL training data.
30
 
 
40
 
41
  *'splade-japanese-v2-doc' model does not require query encoder during inference.
42
 
43
+
44
+
45
  下のコードを実行すれば,単語拡張や重み付けの確認ができます.
46
 
47
  If you'd like to try it out, you can see the expansion of queries or documents by running the code below.
 
61
  tokenizer = AutoTokenizer.from_pretrained("aken12/splade-japanese-v3")
62
  vocab_dict = {v: k for k, v in tokenizer.get_vocab().items()}
63
 
64
+ def encode_query(query): ##query passsage maxlen: 32,180
65
  query = tokenizer(query, return_tensors="pt")
66
  output = model(**query, return_dict=True).logits
67
  output, _ = torch.max(torch.log(1 + torch.relu(output)) * query['attention_mask'].unsqueeze(-1), dim=1)