gonzalez-agirre
commited on
Commit
•
fdd8ca4
1
Parent(s):
c57b9f4
Update README.md
Browse files
README.md
CHANGED
@@ -120,14 +120,14 @@ that has been created along with the model.
|
|
120 |
|
121 |
It contains the following tasks and their related datasets:
|
122 |
|
123 |
-
1.
|
124 |
|
125 |
-
Catalan
|
126 |
|
127 |
-
|
|
|
128 |
|
129 |
-
|
130 |
-
filtering out some unconventional ones, like book titles, and transcribed them into a standard CONLL-IOB format
|
131 |
|
132 |
3. Text Classification (TC)
|
133 |
|
@@ -135,7 +135,7 @@ It contains the following tasks and their related datasets:
|
|
135 |
|
136 |
4. Textual Entailment (TE)
|
137 |
|
138 |
-
**[
|
139 |
|
140 |
5. Semantic Textual Similarity (STS)
|
141 |
|
@@ -159,7 +159,7 @@ Here are the train/dev/test splits of the datasets:
|
|
159 |
| POS (Ancora)| 16,678 | 13,123 | 1,709 | 1,846 |
|
160 |
| STS | 3,073 | 2,073 | 500 | 500 |
|
161 |
| TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
|
162 |
-
| TE (
|
163 |
| QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
|
164 |
| QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
|
165 |
| QA (CatalanQA) | 21,427 | 17,135 | 2,157 | 2,135 |
|
|
|
120 |
|
121 |
It contains the following tasks and their related datasets:
|
122 |
|
123 |
+
1. Named Entity Recognition (NER)
|
124 |
|
125 |
+
**[AnCora Catalan 2.0.0](https://zenodo.org/record/4762031#.YKaFjqGxWUk)**: extracted named entities from the original [Ancora](https://doi.org/10.5281/zenodo.4762030) version, filtering out some unconventional ones, like book titles, and transcribed them into a standard CONLL-IOB format.
|
126 |
|
127 |
+
|
128 |
+
2. Part-of-Speech Tagging (POS)
|
129 |
|
130 |
+
Catalan-Ancora: from the [Universal Dependencies treebank](https://github.com/UniversalDependencies/UD_Catalan-AnCora) of the well-known Ancora corpus.
|
|
|
131 |
|
132 |
3. Text Classification (TC)
|
133 |
|
|
|
135 |
|
136 |
4. Textual Entailment (TE)
|
137 |
|
138 |
+
**[TECa](https://huggingface.co/datasets/projecte-aina/teca)**: consisting of 21,163 pairs of premises and hypotheses, annotated according to the inference relation they have (implication, contradiction, or neutral), extracted from the [Catalan Textual Corpus](https://huggingface.co/datasets/projecte-aina/catalan_textual_corpus).
|
139 |
|
140 |
5. Semantic Textual Similarity (STS)
|
141 |
|
|
|
159 |
| POS (Ancora)| 16,678 | 13,123 | 1,709 | 1,846 |
|
160 |
| STS | 3,073 | 2,073 | 500 | 500 |
|
161 |
| TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
|
162 |
+
| TE (TECa) | 21,163 | 16,930 | 2,116 | 2,117
|
163 |
| QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
|
164 |
| QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
|
165 |
| QA (CatalanQA) | 21,427 | 17,135 | 2,157 | 2,135 |
|