Polish Question Answering

piotr-rybak 's Collections

Extract anything datasets

updated Feb 15

Collection of models and datasets for Polish Question Answering.

Upvote

ipipan/silver-retriever-base-v1.1

Sentence Similarity • Updated May 24 • 830 • 8

Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/silver-retriever-base-v1

Sentence Similarity • Updated May 24 • 2.41k • 10

Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/polqa

Updated May 24 • 3.77k • 7

Note PolQA is the first Polish dataset for open-domain question answering. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7 million candidate passages. The dataset can be used to train both a passage retriever and an abstractive reader.
ipipan/maupqa

Updated May 24 • 98 • 4

Note MAUPQA is a collection of 14 datasets for Polish document retrieval. Most of the datasets are either machine-generated or machine-translated from English. Across all datasets, it consists of over 1M questions, 1M positive, and 7M hard-negative question-passage pairs.
clarin-pl/poquad

Viewer • Updated Jul 4, 2023 • 52k • 653 • 4

Note PoQuAD is a Polish equivalent of the SQuAD. It consists of more than 70,000 question-passage pairs, as well as extractive and abstractive answers.
allegro/polish-question-passage-pairs

Viewer • Updated Sep 23, 2021 • 10.4k • 12 • 4

Note Over 10,000 manually annotated question-passage pairs. While the questions are taken from the PolQA dataset, the passages are often unique. In particular, the dataset consists mostly of hard negatives (8k pairs).
allegro/klej-dyk

Viewer • Updated Oct 26, 2022 • 5.18k • 2.63k • 1

Note The "Czy wiesz?" (eng. "Did you know?") dataset consists of almost 5k question-passage pairs obtained from "Czy wiesz..." section of Polish Wikipedia. Each question is written by a Wikipedia collaborator and is answered with a link to a relevant Wikipedia article.
piotr-rybak/allegro-faq

Viewer • Updated Aug 19, 2023 • 1.88k

Note Allegro FAQ is one of the PolEval 2022 test sets. It consists of 900 frequently asked questions and 921 help articles regarding the large Polish e-commerce platform - Allegro.com. Each question-passage pair is manually checked and edited where necessary.
piotr-rybak/legal-questions

Updated Dec 14, 2023 • 5

Note Legal Questions is one of the PolEval 2022 test sets. It consists of 718 questions and approximately 26,000 passages extracted from over 1,000 acts of law.
Running

16

📈

Polish Information Retrieval Benchmark (PIRB)

Note The benchmark for Polish Information Retrieval, consisting of 41 datasets.
sdadas/mmlw-retrieval-roberta-base

Sentence Similarity • Updated Feb 23 • 239 • 1

Note Neural text encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=mmlw
sdadas/gpt-exams

Viewer • Updated Sep 9, 2023 • 8.13k • 6 • 2

Note The dataset contains 8131 multi-domain question-answer pairs. It was created semi-automatically using the gpt-3.5-turbo-0613 model available in the OpenAI API.
apohllo/plt5-base-poquad

Text2Text Generation • Updated Nov 28, 2023 • 8 • 1

Note This is a plT5-base model trained on the PoQuAD dataset. This model was trained as a result of single experiment run, so don't expect state-of-the-art results.
sdadas/polish-reranker-large-ranknet

Text Classification • Updated Apr 23 • 111

Note Cross-encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=reranker

Upvote

Polish Question Answering

Polish Information Retrieval Benchmark (PIRB)