Update README.md
Browse files
README.md
CHANGED
@@ -127,22 +127,22 @@ It contains the following tasks and their related datasets:
|
|
127 |
|
128 |
3. Text Classification (TC)
|
129 |
|
130 |
-
**[TeCla](https://
|
131 |
|
132 |
4. Semantic Textual Similarity (STS)
|
133 |
|
134 |
-
**[Catalan semantic textual similarity](https://
|
135 |
-
scraped from the [Catalan Textual Corpus](https://
|
136 |
|
137 |
5. Question Answering (QA):
|
138 |
|
139 |
-
**[ViquiQuAD](https://
|
140 |
|
141 |
-
**[VilaQuAD](https://
|
142 |
|
143 |
-
**[CatalanQA](projecte-aina/catalanqa)**: an aggregation of 2 previous datasets (VilaQuAD and ViquiQuAD), 21,427 pairs of Q/A balanced by type of question, containing one question and one answer per context, although the contexts can repeat multiple times.
|
144 |
|
145 |
-
**[XQuAD](https://
|
146 |
|
147 |
Here are the train/dev/test splits of the datasets:
|
148 |
|
|
|
127 |
|
128 |
3. Text Classification (TC)
|
129 |
|
130 |
+
**[TeCla](https://huggingface.co/datasets/projecte-aina/tecla)**: consisting of 137k news pieces from the Catalan News Agency ([ACN](https://www.acn.cat/)) corpus, with 30 labels
|
131 |
|
132 |
4. Semantic Textual Similarity (STS)
|
133 |
|
134 |
+
**[Catalan semantic textual similarity](https://huggingface.co/datasets/projecte-aina/sts-ca)**: consisting of more than 3000 sentence pairs, annotated with the semantic similarity between them,
|
135 |
+
scraped from the [Catalan Textual Corpus](https://huggingface.co/datasets/projecte-aina/catalan_textual_corpus)
|
136 |
|
137 |
5. Question Answering (QA):
|
138 |
|
139 |
+
**[ViquiQuAD](https://huggingface.co/datasets/projecte-aina/viquiquad)**: consisting of more than 15,000 questions outsourced from Catalan Wikipedia randomly chosen from a set of 596 articles that were originally written in Catalan.
|
140 |
|
141 |
+
**[VilaQuAD](https://huggingface.co/datasets/projecte-aina/vilaquad)**: contains 6,282 pairs of questions and answers, outsourced from 2095 Catalan language articles from VilaWeb newswire text.
|
142 |
|
143 |
+
**[CatalanQA](https://huggingface.co/datasets/projecte-aina/catalanqa)**: an aggregation of 2 previous datasets (VilaQuAD and ViquiQuAD), 21,427 pairs of Q/A balanced by type of question, containing one question and one answer per context, although the contexts can repeat multiple times.
|
144 |
|
145 |
+
**[XQuAD](https://huggingface.co/datasets/projecte-aina/xquad-ca)**: the Catalan translation of XQuAD, a multilingual collection of manual translations of 1,190 question-answer pairs from English Wikipedia used only as a _test set_
|
146 |
|
147 |
Here are the train/dev/test splits of the datasets:
|
148 |
|