michal-stefanik
commited on
Commit
•
15ace39
1
Parent(s):
640a529
Readme examples
Browse files
README.md
CHANGED
@@ -5,19 +5,28 @@ language:
|
|
5 |
- multilingual
|
6 |
- cs
|
7 |
- en
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
8 |
---
|
9 |
|
10 |
-
# Mt5-large for
|
11 |
|
12 |
-
This is the [mt5-
|
13 |
given a small set of 2-5 demonstrations (i.e. primes).
|
14 |
|
15 |
-
##
|
16 |
|
17 |
-
Note that **this is a
|
18 |
similarly to GPT-3.
|
19 |
Rather than performing well on the conventional question answering, it aims to learn to extrapolate the pattern of given demonstrations
|
20 |
-
to novel tasks, such as Named Entity Recognition or Keywords Extraction from a given pattern.
|
21 |
|
22 |
## Data & Training
|
23 |
|
@@ -29,10 +38,10 @@ To train the model to use the demonstrations, we've **clustered** the samples by
|
|
29 |
in English AdversarialQA and by the category in the Czech SQAD and used the examples of the same cluster as the demonstrations
|
30 |
of the task in training.
|
31 |
|
32 |
-
We find that the specific algorithm of selection of these demonstrations
|
33 |
-
to new tasks
|
34 |
|
35 |
-
For the Czech SQAD 3.0, original contexts (=whole Wikipedia websites) were limited to a maximum of
|
36 |
per a sequence of prime demonstrations.
|
37 |
Pre-processing script for Czech SQAD is available [here](https://huggingface.co/gaussalgo/xlm-roberta-large_extractive-QA_en-cs/blob/main/parse_czech_squad.py).
|
38 |
|
@@ -88,11 +97,6 @@ input_text = """
|
|
88 |
Context: Customer id: Barrack Obama, if not deliverable, return to Bill Clinton.
|
89 |
Answer:"""
|
90 |
```
|
91 |
-
|
92 |
-
Note that despite its size, English AdversarialQA has a variety of reported biases,
|
93 |
-
conditioned by the relative position or type of the answer in the context that can affect the model's performance on new data
|
94 |
-
(see, e.g. [L. Mikula (2022)](https://is.muni.cz/th/adh58/?lang=en), Chap. 4.1).
|
95 |
-
|
96 |
## Usage
|
97 |
|
98 |
Here is how to use this model to answer the question on a given context using 🤗 Transformers in PyTorch:
|
@@ -100,8 +104,8 @@ Here is how to use this model to answer the question on a given context using
|
|
100 |
```python
|
101 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
102 |
|
103 |
-
tokenizer = AutoTokenizer.from_pretrained("gaussalgo/mt5-
|
104 |
-
model = AutoModelForSeq2SeqLM.from_pretrained("gaussalgo/mt5-
|
105 |
|
106 |
# For the expected format of input_text, see Intended use above
|
107 |
inputs = tokenizer(input_text, return_tensors="pt")
|
|
|
5 |
- multilingual
|
6 |
- cs
|
7 |
- en
|
8 |
+
widget:
|
9 |
+
- text: "Otázka: Jaký je důvod dotazu zákazníka?\nKontext: Dobrý den, Žádáme zaslání nové smlouvy kvůli řešení pojistné události. Zašlete na tento mail nebo přímo do systému. S pozdravem Petra Hladká | disponentka servisu.\nOdpověď: řešení pojistné události\nOtázka: Jaký je důvod dotazu zákazníka?\nKontext: Dobrý den, chtěla bych Vás požádat o zaslání kopie technického průkazu z důvodu jeho ztráty. S pozdravem Milan Tvrdý.\nOdpověď:"
|
10 |
+
example_title: "Few-shot: Customer request (cs)"
|
11 |
+
- text: "Otázka: Jaké schopnosti daly magické předměty Jurovi Jánošíkovi? \nKontext: Podle slovenského lidového podání byl Juro Jánošík obdařen magickými předměty (kouzelná valaška, čarovný opasek), které mu dodávaly nadpřirozené schopnosti. Okrádal především šlechtice, trestal panské dráby a ze svého lupu vyděloval část pro chudé, tedy bohatým bral a chudým dával. \nOdpověď:"
|
12 |
+
example_title: "Zero-shot: Question Answering (cs)"
|
13 |
+
- text: "Question: What is the score of this review? \n Context: I did not like the plot at all. Not recommended. \n Answer: 1 \n Question: What is the score of this review? \n Context: I loved the performance. Can’t believe they did not use CGI for the finale. I think it’s my new favourite movie. \nAnswer: 5 \nQuestion: Is the score of this review 1, 2, 3, 4 or 5? \nContext: The beginning was awesome, but at the end it felt a little rushed. I enjoyed the movie, but probably won’t rewatch soon. \nAnswer:"
|
14 |
+
example_title: "Few-shot: Movie reviews (en)"
|
15 |
+
- text: "Question: What is the score of this review? \n Context: I did not like the plot at all. Not recommended. \n Answer: 1 \n Question: What is the score of this review? \n Context: I loved the performance. Can’t believe they did not use CGI for the finale. I think it’s my new favourite movie. \nAnswer: 5 \nQuestion: Is the score of this review 1, 2, 3, 4 or 5? \nContext: The beginning was awesome, but at the end it felt a little rushed. I enjoyed the movie, but probably won’t rewatch soon. \nAnswer:"
|
16 |
+
example_title: "Few-shot: Customer request (en)"
|
17 |
---
|
18 |
|
19 |
+
# Mt5-large for Few-shot Czech+English Generative Question Answering
|
20 |
|
21 |
+
This is the [mt5-large](https://huggingface.co/google/mt5-large) model with an LM head for a generation of extractive answers,
|
22 |
given a small set of 2-5 demonstrations (i.e. primes).
|
23 |
|
24 |
+
## Few-shot (i.e. priming)
|
25 |
|
26 |
+
Note that **this is primarily a few-shot model** that expects a **set of demonstrations** of your task of interest,
|
27 |
similarly to GPT-3.
|
28 |
Rather than performing well on the conventional question answering, it aims to learn to extrapolate the pattern of given demonstrations
|
29 |
+
to novel tasks, such as Named Entity Recognition or Keywords Extraction from a given pattern. However, it can be also used as conventional QA model (see examples).
|
30 |
|
31 |
## Data & Training
|
32 |
|
|
|
38 |
in English AdversarialQA and by the category in the Czech SQAD and used the examples of the same cluster as the demonstrations
|
39 |
of the task in training.
|
40 |
|
41 |
+
We find that the specific algorithm of selection of these demonstrations is crucial for the model's ability to extrapolate
|
42 |
+
to new tasks. We'll share more details in the following article; stay tuned!
|
43 |
|
44 |
+
For the Czech SQAD 3.0, original contexts (=whole Wikipedia websites) were limited to a maximum of 4000 characters
|
45 |
per a sequence of prime demonstrations.
|
46 |
Pre-processing script for Czech SQAD is available [here](https://huggingface.co/gaussalgo/xlm-roberta-large_extractive-QA_en-cs/blob/main/parse_czech_squad.py).
|
47 |
|
|
|
97 |
Context: Customer id: Barrack Obama, if not deliverable, return to Bill Clinton.
|
98 |
Answer:"""
|
99 |
```
|
|
|
|
|
|
|
|
|
|
|
100 |
## Usage
|
101 |
|
102 |
Here is how to use this model to answer the question on a given context using 🤗 Transformers in PyTorch:
|
|
|
104 |
```python
|
105 |
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
|
106 |
|
107 |
+
tokenizer = AutoTokenizer.from_pretrained("gaussalgo/mt5-large-priming-QA_en-cs")
|
108 |
+
model = AutoModelForSeq2SeqLM.from_pretrained("gaussalgo/mt5-large-priming-QA_en-cs")
|
109 |
|
110 |
# For the expected format of input_text, see Intended use above
|
111 |
inputs = tokenizer(input_text, return_tensors="pt")
|