aken12
/

splade-japanese-v3

Inference Endpoints

Model card Files Files and versions Community

aken12 commited on Apr 21

Commit

e58d212

•

1 Parent(s): b3ec1e2

Update README.md

Files changed (1) hide show

README.md +21 -1

README.md CHANGED Viewed

@@ -7,6 +7,24 @@ language:
 - ja
 ---
 ## Evaluation on [MIRACL japanese](https://huggingface.co/datasets/miracl/miracl)
 These models don't train on the MIRACL training data.
@@ -22,6 +40,8 @@ These models don't train on the MIRACL training data.
 *'splade-japanese-v2-doc' model does not require query encoder during inference.
 下のコードを実行すれば，単語拡張や重み付けの確認ができます．
 If you'd like to try it out, you can see the expansion of queries or documents by running the code below.
@@ -41,7 +61,7 @@ model = AutoModelForMaskedLM.from_pretrained("aken12/splade-japanese-v3")
 tokenizer = AutoTokenizer.from_pretrained("aken12/splade-japanese-v3")
 vocab_dict = {v: k for k, v in tokenizer.get_vocab().items()}
-def encode_query(query):
     query = tokenizer(query, return_tensors="pt")
     output = model(**query, return_dict=True).logits
     output, _ = torch.max(torch.log(1 + torch.relu(output)) * query['attention_mask'].unsqueeze(-1), dim=1)

 - ja
 ---
+|                     |     |           | JQaRa     |           |           |
+| ------------------- | --- | --------- | --------- | --------- | --------- |
+|                     |     | NDCG@10   | MRR@10    | NDCG@100  | MRR@100   |
+| splade-japanese-v3  |     |  0.505    | 0.772     |    0.7    |  0.775    |
+| JaColBERTv2         |     |  0.585    | 0.836     | 0.753     | 0.838     |
+| JaColBERT           |     | 0.549     | 0.811     | 0.730     | 0.814     |
+| bge-m3+all          |     | 0.576     | 0.818     | 0.745     | 0.820     |
+| bg3-m3+dense        |     | 0.539     | 0.785     | 0.721     | 0.788     |
+| m-e5-large          |     | 0.554     | 0.799     | 0.731     | 0.801     |
+| m-e5-base           |     | 0.471     | 0.727     | 0.673     | 0.731     |
+| m-e5-small          |     | 0.492     | 0.729     | 0.689     | 0.733     |
+| GLuCoSE             |     | 0.308     | 0.518     | 0.564     | 0.527     |
+| sup-simcse-ja-base  |     | 0.324     | 0.541     | 0.572     | 0.550     |
+| sup-simcse-ja-large |     | 0.356     | 0.575     | 0.596     | 0.583     |
+| fio-base-v0.1       |     | 0.372     | 0.616     | 0.608     | 0.622     |
 ## Evaluation on [MIRACL japanese](https://huggingface.co/datasets/miracl/miracl)
 These models don't train on the MIRACL training data.
 *'splade-japanese-v2-doc' model does not require query encoder during inference.
 下のコードを実行すれば，単語拡張や重み付けの確認ができます．
 If you'd like to try it out, you can see the expansion of queries or documents by running the code below.
 tokenizer = AutoTokenizer.from_pretrained("aken12/splade-japanese-v3")
 vocab_dict = {v: k for k, v in tokenizer.get_vocab().items()}
+def encode_query(query): ##query passsage maxlen: 32,180
     query = tokenizer(query, return_tensors="pt")
     output = model(**query, return_dict=True).logits
     output, _ = torch.max(torch.log(1 + torch.relu(output)) * query['attention_mask'].unsqueeze(-1), dim=1)