crodri commited on
Commit
d0d781f
1 Parent(s): facae26

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -127,22 +127,22 @@ It contains the following tasks and their related datasets:
127
 
128
  3. Text Classification (TC)
129
 
130
- **[TeCla](https://doi.org/10.5281/zenodo.4627197)**: consisting of 137k news pieces from the Catalan News Agency ([ACN](https://www.acn.cat/)) corpus
131
 
132
  4. Semantic Textual Similarity (STS)
133
 
134
- **[Catalan semantic textual similarity](https://doi.org/10.5281/zenodo.4529183)**: consisting of more than 3000 sentence pairs, annotated with the semantic similarity between them,
135
- scraped from the [Catalan Textual Corpus](https://doi.org/10.5281/zenodo.4519349)
136
 
137
  5. Question Answering (QA):
138
 
139
- **[ViquiQuAD](https://doi.org/10.5281/zenodo.4562344)**: consisting of more than 15,000 questions outsourced from Catalan Wikipedia randomly chosen from a set of 596 articles that were originally written in Catalan.
140
 
141
- **[VilaQuAD](https://doi.org/10.5281/zenodo.4562337)**: contains 6,282 pairs of questions and answers, outsourced from 2095 Catalan language articles from VilaWeb newswire text.
142
 
143
- **[CatalanQA](projecte-aina/catalanqa)**: an aggregation of 2 previous datasets (VilaQuAD and ViquiQuAD), 21,427 pairs of Q/A balanced by type of question, containing one question and one answer per context, although the contexts can repeat multiple times.
144
 
145
- **[XQuAD](https://doi.org/10.5281/zenodo.4526223)**: the Catalan translation of XQuAD, a multilingual collection of manual translations of 1,190 question-answer pairs from English Wikipedia used only as a _test set_
146
 
147
  Here are the train/dev/test splits of the datasets:
148
 
 
127
 
128
  3. Text Classification (TC)
129
 
130
+ **[TeCla](https://huggingface.co/datasets/projecte-aina/tecla)**: consisting of 137k news pieces from the Catalan News Agency ([ACN](https://www.acn.cat/)) corpus, with 30 labels
131
 
132
  4. Semantic Textual Similarity (STS)
133
 
134
+ **[Catalan semantic textual similarity](https://huggingface.co/datasets/projecte-aina/sts-ca)**: consisting of more than 3000 sentence pairs, annotated with the semantic similarity between them,
135
+ scraped from the [Catalan Textual Corpus](https://huggingface.co/datasets/projecte-aina/catalan_textual_corpus)
136
 
137
  5. Question Answering (QA):
138
 
139
+ **[ViquiQuAD](https://huggingface.co/datasets/projecte-aina/viquiquad)**: consisting of more than 15,000 questions outsourced from Catalan Wikipedia randomly chosen from a set of 596 articles that were originally written in Catalan.
140
 
141
+ **[VilaQuAD](https://huggingface.co/datasets/projecte-aina/vilaquad)**: contains 6,282 pairs of questions and answers, outsourced from 2095 Catalan language articles from VilaWeb newswire text.
142
 
143
+ **[CatalanQA](https://huggingface.co/datasets/projecte-aina/catalanqa)**: an aggregation of 2 previous datasets (VilaQuAD and ViquiQuAD), 21,427 pairs of Q/A balanced by type of question, containing one question and one answer per context, although the contexts can repeat multiple times.
144
 
145
+ **[XQuAD](https://huggingface.co/datasets/projecte-aina/xquad-ca)**: the Catalan translation of XQuAD, a multilingual collection of manual translations of 1,190 question-answer pairs from English Wikipedia used only as a _test set_
146
 
147
  Here are the train/dev/test splits of the datasets:
148