🤗 bert-restore-punctuation-ptbr
- 🪄 W&B Dashboard
- ⛭ GitHub
This is a bert-base-portuguese-cased model finetuned for punctuation restoration on WikiLingua.
This model is intended for direct use as a punctuation restoration model for the general Portuguese language. Alternatively, you can use this for further fine-tuning on domain-specific texts for punctuation restoration tasks.
Model restores the following punctuations -- [! ? . , - : ; ' ]
The model also restores the upper-casing of words.
🤷 Usage
🇧🇷 easy-to-use package to restore punctuation of portuguese texts.
Below is a quick way to use the template.
- First, install the package.
pip install respunct
- Sample python code.
from respunct import RestorePuncts
model = RestorePuncts()
model.restore_puncts("""
henrique foi no lago pescar com o pedro mais tarde foram para a casa do pedro fritar os peixes""")
# output:
# Henrique foi no lago pescar com o Pedro. Mais tarde, foram para a casa do Pedro fritar os peixes.
🎯 Accuracy
label | precision | recall | f1-score | support |
---|---|---|---|---|
Upper - OU | 0.89 | 0.91 | 0.90 | 69376 |
None - OO | 0.99 | 0.98 | 0.98 | 857659 |
Full stop/period - .O | 0.86 | 0.93 | 0.89 | 60410 |
Comma - ,O | 0.85 | 0.83 | 0.84 | 48608 |
Upper + Comma - ,U | 0.73 | 0.76 | 0.75 | 3521 |
Question - ?O | 0.68 | 0.78 | 0.73 | 1168 |
Upper + period - .U | 0.66 | 0.72 | 0.69 | 1884 |
Upper + colon - :U | 0.59 | 0.63 | 0.61 | 352 |
Colon - :O | 0.70 | 0.53 | 0.60 | 2420 |
Question Mark - ?U | 0.50 | 0.56 | 0.53 | 36 |
Upper + Exclam. - !U | 0.38 | 0.32 | 0.34 | 38 |
Exclamation Mark - !O | 0.30 | 0.05 | 0.08 | 783 |
Semicolon - ;O | 0.35 | 0.04 | 0.08 | 1557 |
Apostrophe - 'O | 0.00 | 0.00 | 0.00 | 3 |
Hyphen - -O | 0.00 | 0.00 | 0.00 | 3 |
accuracy | 0.96 | 1047818 | ||
macro avg | 0.57 | 0.54 | 0.54 | 1047818 |
weighted avg | 0.96 | 0.96 | 0.96 | 1047818 |
🤙 Contact
Maicon Domingues for questions, feedback and/or requests for similar models.
- Downloads last month
- 34
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train dominguesm/bert-restore-punctuation-ptbr
Evaluation results
- F1 Score on wiki_linguaself-reported55.700
- Precision on wiki_linguaself-reported57.720
- Recall on wiki_linguaself-reported53.830