|
--- |
|
inference: false |
|
language: |
|
- bg |
|
license: mit |
|
datasets: |
|
- oscar |
|
- chitanka |
|
- wikipedia |
|
tags: |
|
- torch |
|
--- |
|
|
|
# BERT BASE (cased) finetuned on Bulgarian part-of-speech data |
|
|
|
Pretrained model on Bulgarian language using a masked language modeling (MLM) objective. It was introduced in |
|
[this paper](https://arxiv.org/abs/1810.04805) and first released in |
|
[this repository](https://github.com/google-research/bert). This model is cased: it does make a difference |
|
between bulgarian and Bulgarian. |
|
|
|
It was finetuned on public part-of-speech Bulgarian data. |
|
|
|
### How to use |
|
|
|
Here is how to use this model in PyTorch: |
|
|
|
```python |
|
>>> from transformers import pipeline |
|
>>> |
|
>>> model = pipeline( |
|
>>> 'token-classification', |
|
>>> model='rmihaylov/bert-base-pos-theseus-bg', |
|
>>> tokenizer='rmihaylov/bert-base-pos-theseus-bg', |
|
>>> device=0, |
|
>>> revision=None) |
|
>>> output = model('Здравей, аз се казвам Иван.') |
|
>>> print(output) |
|
|
|
[{'end': 7, |
|
'entity': 'INTJ', |
|
'index': 1, |
|
'score': 0.9640711, |
|
'start': 0, |
|
'word': '▁Здравей'}, |
|
{'end': 8, |
|
'entity': 'PUNCT', |
|
'index': 2, |
|
'score': 0.9998927, |
|
'start': 7, |
|
'word': ','}, |
|
{'end': 11, |
|
'entity': 'PRON', |
|
'index': 3, |
|
'score': 0.9998872, |
|
'start': 8, |
|
'word': '▁аз'}, |
|
{'end': 14, |
|
'entity': 'PRON', |
|
'index': 4, |
|
'score': 0.99990034, |
|
'start': 11, |
|
'word': '▁се'}, |
|
{'end': 21, |
|
'entity': 'VERB', |
|
'index': 5, |
|
'score': 0.99989736, |
|
'start': 14, |
|
'word': '▁казвам'}, |
|
{'end': 26, |
|
'entity': 'PROPN', |
|
'index': 6, |
|
'score': 0.99990785, |
|
'start': 21, |
|
'word': '▁Иван'}, |
|
{'end': 27, |
|
'entity': 'PUNCT', |
|
'index': 7, |
|
'score': 0.9999685, |
|
'start': 26, |
|
'word': '.'}] |
|
``` |
|
|