Update README.md
Browse files
README.md
CHANGED
@@ -1,3 +1,68 @@
|
|
1 |
---
|
|
|
|
|
|
|
|
|
2 |
license: apache-2.0
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
pipeline_tag: feature-extraction
|
3 |
+
tags:
|
4 |
+
- feature-extraction
|
5 |
+
- transformers
|
6 |
license: apache-2.0
|
7 |
+
language:
|
8 |
+
- id
|
9 |
+
metrics:
|
10 |
+
- accuracy
|
11 |
+
- f1
|
12 |
+
- precision
|
13 |
+
- recall
|
14 |
+
datasets:
|
15 |
+
- squad_v2
|
16 |
+
- natural_questions
|
17 |
---
|
18 |
+
### indo-dpr-question_encoder-multiset-base
|
19 |
+
<p style="font-size:16px">Indonesian Dense Passage Retrieval trained on translated SQuADv2.0 and Natural Question dataset in DPR format.</p>
|
20 |
+
|
21 |
+
|
22 |
+
### Evaluation
|
23 |
+
|
24 |
+
| Class | Precision | Recall | F1-Score | Support |
|
25 |
+
|-------|-----------|--------|----------|---------|
|
26 |
+
| hard_negative | 0.9961 | 0.9961 | 0.9961 | 384778 |
|
27 |
+
| positive | 0.8783 | 0.8783 | 0.8783 | 12414 |
|
28 |
+
|
29 |
+
| Metric | Value |
|
30 |
+
|--------|-------|
|
31 |
+
| Loss | 0.0220 |
|
32 |
+
| Accuracy | 0.9924 |
|
33 |
+
| Macro Average | 0.9372 |
|
34 |
+
| Weighted Average | 0.9924 |
|
35 |
+
| Accuracy and F1 | 0.9353 |
|
36 |
+
| Average Rank | 0.2194 |
|
37 |
+
|
38 |
+
|
39 |
+
<p style="font-size:16px">Note: This report is for evaluation on the dev set, after 27288 batches.</p>
|
40 |
+
|
41 |
+
### Usage
|
42 |
+
|
43 |
+
```python
|
44 |
+
from transformers import DPRQuestionEncoder, DPRQuestionEncoderTokenizer
|
45 |
+
|
46 |
+
tokenizer = DPRQuestionEncoderTokenizer.from_pretrained('firqaaa/indo-dpr-question_encoder-multiset-base')
|
47 |
+
model = DPRQuestionEncoder.from_pretrained('firqaaa/indo-dpr-question_encoder-multiset-base')
|
48 |
+
input_ids = tokenizer("Siapa nama pengarang manga Yu-Gi-Oh?", return_tensors='pt')["input_ids"]
|
49 |
+
embeddings = model(input_ids).pooler_output
|
50 |
+
```
|
51 |
+
|
52 |
+
You can use it using `haystack` as follows:
|
53 |
+
|
54 |
+
```
|
55 |
+
from haystack.nodes import DensePassageRetriever
|
56 |
+
from haystack.document_stores import InMemoryDocumentStore
|
57 |
+
|
58 |
+
retriever = DensePassageRetriever(document_store=InMemoryDocumentStore(),
|
59 |
+
query_embedding_model="firqaaa/indo-dpr-question_encoder-multiset-base",
|
60 |
+
passage_embedding_model="firqaaa/indo-dpr-question_encoder-multiset-base",
|
61 |
+
max_seq_len_query=64,
|
62 |
+
max_seq_len_passage=256,
|
63 |
+
batch_size=16,
|
64 |
+
use_gpu=True,
|
65 |
+
embed_title=True,
|
66 |
+
use_fast_tokenizers=True)
|
67 |
+
```
|
68 |
+
|