gonzalez-agirre commited on
Commit
fdd8ca4
1 Parent(s): c57b9f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -7
README.md CHANGED
@@ -120,14 +120,14 @@ that has been created along with the model.
120
 
121
  It contains the following tasks and their related datasets:
122
 
123
- 1. Part-of-Speech Tagging (POS)
124
 
125
- Catalan-Ancora: from the [Universal Dependencies treebank](https://github.com/UniversalDependencies/UD_Catalan-AnCora) of the well-known Ancora corpus
126
 
127
- 2. Named Entity Recognition (NER)
 
128
 
129
- **[AnCora Catalan 2.0.0](https://zenodo.org/record/4762031#.YKaFjqGxWUk)**: extracted named entities from the original [Ancora](https://doi.org/10.5281/zenodo.4762030) version,
130
- filtering out some unconventional ones, like book titles, and transcribed them into a standard CONLL-IOB format
131
 
132
  3. Text Classification (TC)
133
 
@@ -135,7 +135,7 @@ It contains the following tasks and their related datasets:
135
 
136
  4. Textual Entailment (TE)
137
 
138
- **[TeCa](https://huggingface.co/datasets/projecte-aina/teca)**: consisting of 21,163 pairs of premises and hypotheses, annotated according to the inference relation they have (implication, contradiction, or neutral), extracted from the [Catalan Textual Corpus](https://huggingface.co/datasets/projecte-aina/catalan_textual_corpus).
139
 
140
  5. Semantic Textual Similarity (STS)
141
 
@@ -159,7 +159,7 @@ Here are the train/dev/test splits of the datasets:
159
  | POS (Ancora)| 16,678 | 13,123 | 1,709 | 1,846 |
160
  | STS | 3,073 | 2,073 | 500 | 500 |
161
  | TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
162
- | TE (TeCa) | 21,163 | 16,930 | 2,116 | 2,117
163
  | QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
164
  | QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
165
  | QA (CatalanQA) | 21,427 | 17,135 | 2,157 | 2,135 |
 
120
 
121
  It contains the following tasks and their related datasets:
122
 
123
+ 1. Named Entity Recognition (NER)
124
 
125
+ **[AnCora Catalan 2.0.0](https://zenodo.org/record/4762031#.YKaFjqGxWUk)**: extracted named entities from the original [Ancora](https://doi.org/10.5281/zenodo.4762030) version, filtering out some unconventional ones, like book titles, and transcribed them into a standard CONLL-IOB format.
126
 
127
+
128
+ 2. Part-of-Speech Tagging (POS)
129
 
130
+ Catalan-Ancora: from the [Universal Dependencies treebank](https://github.com/UniversalDependencies/UD_Catalan-AnCora) of the well-known Ancora corpus.
 
131
 
132
  3. Text Classification (TC)
133
 
 
135
 
136
  4. Textual Entailment (TE)
137
 
138
+ **[TECa](https://huggingface.co/datasets/projecte-aina/teca)**: consisting of 21,163 pairs of premises and hypotheses, annotated according to the inference relation they have (implication, contradiction, or neutral), extracted from the [Catalan Textual Corpus](https://huggingface.co/datasets/projecte-aina/catalan_textual_corpus).
139
 
140
  5. Semantic Textual Similarity (STS)
141
 
 
159
  | POS (Ancora)| 16,678 | 13,123 | 1,709 | 1,846 |
160
  | STS | 3,073 | 2,073 | 500 | 500 |
161
  | TC (TeCla) | 137,775 | 110,203 | 13,786 | 13,786|
162
+ | TE (TECa) | 21,163 | 16,930 | 2,116 | 2,117
163
  | QA (VilaQuAD) | 6,282 | 3,882 | 1,200 | 1,200 |
164
  | QA (ViquiQuAD) | 14,239 | 11,255 | 1,492 | 1,429 |
165
  | QA (CatalanQA) | 21,427 | 17,135 | 2,157 | 2,135 |