|
--- |
|
tags: |
|
- generation |
|
language: |
|
- multilingual |
|
- cs |
|
- en |
|
widget: |
|
- text: "Otázka: Jaký je důvod dotazu zákazníka?\nKontext: Dobrý den, Žádáme zaslání nové smlouvy kvůli řešení pojistné události. Zašlete na tento mail nebo přímo do systému. S pozdravem Petra Hladká | disponentka servisu.\nOdpověď: řešení pojistné události\nOtázka: Jaký je důvod dotazu zákazníka?\nKontext: Dobrý den, chtěla bych Vás požádat o zaslání kopie technického průkazu z důvodu jeho ztráty. S pozdravem Milan Tvrdý.\nOdpověď:" |
|
example_title: "k-shot: Requests (cs)" |
|
- text: "Otázka: Jaké schopnosti daly magické předměty Jurovi Jánošíkovi? \nKontext: Podle slovenského lidového podání byl Juro Jánošík obdařen magickými předměty (kouzelná valaška, čarovný opasek), které mu dodávaly nadpřirozené schopnosti. Okrádal především šlechtice, trestal panské dráby a ze svého lupu vyděloval část pro chudé, tedy bohatým bral a chudým dával. \nOdpověď:" |
|
example_title: "0-shot: Answering (cs)" |
|
- text: "Question: What is the score of this review? \n Context: I did not like the plot at all. Not recommended. \n Answer: 1 \n Question: What is the score of this review? \n Context: I loved the performance. Can’t believe they did not use CGI for the finale. I think it’s my new favourite movie. \nAnswer: 5 \nQuestion: Is the score of this review 1, 2, 3, 4 or 5? \nContext: The beginning was awesome, but at the end it felt a little rushed. I enjoyed the movie, but probably won’t rewatch soon. \nAnswer:" |
|
example_title: "k-shot: Reviews (en)" |
|
- text: "Question: What is the customer's name? \nContext: Origin: Barrack Obama, Customer id: Bill Moe. \nAnswer: Bill Moe, \nQuestion: What is the customer's name? \nContext: Customer id: Barrack Obama, if not deliverable, return to Bill Clinton. \nAnswer:" |
|
example_title: "k-shot: Request (en)" |
|
--- |
|
|
|
# Mt5-large for Few-shot Czech+English Generative Question Answering |
|
|
|
This is the [mt5-large](https://huggingface.co/google/mt5-large) model with an LM head for a generation of extractive answers, |
|
given a small set of 2-5 demonstrations (i.e. primes). |
|
|
|
## Few-shot (i.e. priming) |
|
|
|
Note that **this is primarily a few-shot model** that expects a **set of demonstrations** of your task of interest, |
|
similarly to GPT-3. |
|
Rather than performing well on the conventional question answering, it aims to learn to extrapolate the pattern of given demonstrations |
|
to novel tasks, such as Named Entity Recognition or Keywords Extraction from a given pattern. However, it can be also used as conventional QA model (see examples). |
|
|
|
## Data & Training |
|
|
|
This model was trained on a combination of [AdversarialQA](https://adversarialqa.github.io) |
|
and [Czech SQAD 3.0](https://lindat.cz/repository/xmlui/handle/11234/1-3069) |
|
Question Answering datasets. |
|
|
|
To train the model to use the demonstrations, we've **clustered** the samples by the question-word(s) |
|
in English AdversarialQA and by the category in the Czech SQAD and used the examples of the same cluster as the demonstrations |
|
of the task in training. |
|
|
|
We find that the specific algorithm of selection of these demonstrations is crucial for the model's ability to extrapolate |
|
to new tasks. We'll share more details in the following article; stay tuned! |
|
|
|
For the Czech SQAD 3.0, original contexts (=whole Wikipedia websites) were limited to a maximum of 4000 characters |
|
per a sequence of prime demonstrations. |
|
Pre-processing script for Czech SQAD is available [here](https://huggingface.co/gaussalgo/xlm-roberta-large_extractive-QA_en-cs/blob/main/parse_czech_squad.py). |
|
|
|
|
|
For training the model (and hence intended also for the inference), we've used the following patterns of 2-7 demonstrations: |
|
|
|
For English samples: |
|
|
|
*input*: |
|
``` |
|
Question: {Q1} Context: {C1} Answer: {A1}, |
|
Question: {Q2} Context: {C2} Answer: {A2}, |
|
[...possibly more demonstrations...] |
|
|
|
Question: {Q} Context: {C} Answer:` |
|
``` |
|
=> *target*: |
|
``` |
|
{A} |
|
``` |
|
|
|
For Czech samples: |
|
|
|
*input*: |
|
``` |
|
Otázka: {Q1} Kontext: {C1} Odpověď: {A1}, |
|
Otázka: {Q2} Kontext: {C2} Odpověď: {A2}, |
|
[...possibly more demonstrations...] |
|
|
|
Otázka: {Q} Kontext: {C} Odpověď:` |
|
``` |
|
=> *target*: |
|
``` |
|
{A} |
|
``` |
|
|
|
|
|
The best checkpoint was picked to maximize the model's zero-shot performance on unseen Named Entity Recognition |
|
from the out-of-distribution domain of texts and labels. |
|
|
|
## Intended uses & limitations |
|
|
|
This model is purposed for a few-shot application on any text extraction task in English and Czech, where the prompt can be stated |
|
as a natural question. E.g. to use this model for extracting the entities of customer names from the text, |
|
prompt it with demonstrations in the following format: |
|
|
|
```python |
|
input_text = """ |
|
Question: What is the customer's name? |
|
Context: Origin: Barrack Obama, Customer id: Bill Moe. |
|
Answer: Bill Moe, |
|
Question: What is the customer's name? |
|
Context: Customer id: Barrack Obama, if not deliverable, return to Bill Clinton. |
|
Answer:""" |
|
``` |
|
## Usage |
|
|
|
Here is how to use this model to answer the question on a given context using 🤗 Transformers in PyTorch: |
|
|
|
```python |
|
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM |
|
|
|
tokenizer = AutoTokenizer.from_pretrained("gaussalgo/mt5-large-priming-QA_en-cs") |
|
model = AutoModelForSeq2SeqLM.from_pretrained("gaussalgo/mt5-large-priming-QA_en-cs") |
|
|
|
# For the expected format of input_text, see Intended use above |
|
inputs = tokenizer(input_text, return_tensors="pt") |
|
|
|
outputs = model.generate(**inputs) |
|
|
|
print("Answer:") |
|
print(tokenizer.decode(outputs)) |
|
``` |
|
|