|
--- |
|
license: cc-by-4.0 |
|
--- |
|
|
|
## Readability benchmark (ES): bertin-es-sentences-2class |
|
|
|
This project is part of a series of models from the paper "A Benchmark for Neural Readability Assessment of Texts in Spanish". |
|
You can find more details about the project in our [GitHub](https://github.com/lmvasque/readability-es-benchmark). |
|
|
|
## Models |
|
|
|
Our models were fine-tuned in multiple settings, including readability assessment in 2-class (simple/complex) and 3-class (basic/intermediate/advanced) for sentences and paragraph datasets. |
|
You can find more details in our [paper](https://drive.google.com/file/d/1KdwvqrjX8MWYRDGBKeHmiR1NCzDcVizo/view?usp=share_link). |
|
These are the available models you can use (current model page in bold): |
|
|
|
| Model | Granularity | # classes | |
|
|-----------------------------------------------------------------------------------------------------------|----------------|:---------:| |
|
| [BERTIN (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-bertin-es-paragraphs-2class) | paragraphs | 2 | |
|
| [BERTIN (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-bertin-es-paragraphs-3class) | paragraphs | 3 | |
|
| [mBERT (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-mbert-es-paragraphs-2class) | paragraphs | 2 | |
|
| [mBERT (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-mbert-es-paragraphs-3class) | paragraphs | 3 | |
|
| [mBERT (EN+ES)](https://huggingface.co/lmvasque/readability-es-benchmark-mbert-en-es-paragraphs-3class) | paragraphs | 3 | |
|
| **[BERTIN (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-bertin-es-sentences-2class)** | **sentences** | **2** | |
|
| [BERTIN (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-bertin-es-sentences-3class) | sentences | 3 | |
|
| [mBERT (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-mbert-es-sentences-2class) | sentences | 2 | |
|
| [mBERT (ES)](https://huggingface.co/lmvasque/readability-es-benchmark-mbert-es-sentences-3class) | sentences | 3 | |
|
| [mBERT (EN+ES)](https://huggingface.co/lmvasque/readability-es-benchmark-mbert-en-es-sentences-3class) | sentences | 3 | |
|
|
|
|
|
For the zero-shot setting, we used the original models [BERTIN](bertin-project/bertin-roberta-base-spanish) and [mBERT](https://huggingface.co/bert-base-multilingual-uncased) with no further training. |
|
## Results |
|
|
|
These are our results for all the readability models in different settings. Please select your model based on the desired performance: |
|
|
|
| Granularity | Model | F1 Score (2-class) | Precision (2-class) | Recall (2-class) | F1 Score (3-class) | Precision (3-class) | Recall (3-class) | |
|
|-------------|---------------|:-------------------:|:---------------------:|:------------------:|:--------------------:|:---------------------:|:------------------:| |
|
| Paragraph | Baseline (TF-IDF+LR) | 0.829 | 0.832 | 0.827 | 0.556 | 0.563 | 0.550 | |
|
| Paragraph | BERTIN (Zero) | 0.308 | 0.222 | 0.500 | 0.227 | 0.284 | 0.338 | |
|
| Paragraph | BERTIN (ES) | 0.924 | 0.923 | 0.925 | 0.772 | 0.776 | 0.768 | |
|
| Paragraph | mBERT (Zero) | 0.308 | 0.222 | 0.500 | 0.253 | 0.312 | 0.368 | |
|
| Paragraph | mBERT (EN) | - | - | - | 0.505 | 0.560 | 0.552 | |
|
| Paragraph | mBERT (ES) | **0.933** | **0.932** | **0.936** | 0.776 | 0.777 | 0.778 | |
|
| Paragraph | mBERT (EN+ES) | - | - | - | **0.779** | **0.783** | **0.779** | |
|
| Sentence | Baseline (TF-IDF+LR) | 0.811 | 0.814 | 0.808 | 0.525 | 0.531 | 0.521 | |
|
| Sentence | BERTIN (Zero) | 0.367 | 0.290 | 0.500 | 0.188 | 0.232 | 0.335 | |
|
| Sentence | BERTIN (ES) | **0.900** | **0.900** | **0.900** | **0.699** | **0.701** | **0.698** | |
|
| Sentence | mBERT (Zero) | 0.367 | 0.290 | 0.500 | 0.278 | 0.329 | 0.351 | |
|
| Sentence | mBERT (EN) | - | - | - | 0.521 | 0.565 | 0.539 | |
|
| Sentence | mBERT (ES) | 0.893 | 0.891 | 0.896 | 0.688 | 0.686 | 0.691 | |
|
| Sentence | mBERT (EN+ES) | - | - | - | 0.679 | 0.676 | 0.682 | |
|
|
|
|
|
## Citation |
|
|
|
If you use our results and scripts in your research, please cite our work: "[A Benchmark for Neural Readability Assessment of Texts in Spanish](https://drive.google.com/file/d/1KdwvqrjX8MWYRDGBKeHmiR1NCzDcVizo/view?usp=share_link)" (to be published) |
|
|
|
``` |
|
@inproceedings{vasquez-rodriguez-etal-2022-benchmarking, |
|
title = "A Benchmark for Neural Readability Assessment of Texts in Spanish", |
|
author = "V{\'a}squez-Rodr{\'\i}guez, Laura and |
|
Cuenca-Jim{\'\e}nez, Pedro-Manuel and |
|
Morales-Esquivel, Sergio Esteban and |
|
Alva-Manchego, Fernando", |
|
booktitle = "Workshop on Text Simplification, Accessibility, and Readability (TSAR-2022), EMNLP 2022", |
|
month = dec, |
|
year = "2022", |
|
} |
|
``` |
|
|