File size: 1,054 Bytes

0fada08
dc773ea
 
 
 
0fada08
 
dc773ea

---
language: pl
tags:
  - T5
  - lemmatization
license: apache-2.0
---


# PoLemma Base

PoLemma models are intended for lemmatization of named entities and multi-word expressions in the Polish language.

They were fine-tuned from the allegro/plT5 models, e.g.: [allegro/plt5-base](https://huggingface.co/allegro/plt5-base).

## Usage

Sample usage:

```
from transformers import pipeline

pipe = pipeline(task="text2text-generation", model="amu-cai/polemma-base", tokenizer="amu-cai/polemma-base")
hyp = [res['generated_text'] for res in pipe(["wytrenowanego modelu"], clean_up_tokenization_spaces=True, num_beams=5)][0]
```


## Evaluation results

Lemmatization Exact Match was computed on the SlavNER 2021 test set.

| Model | Exact Match ||
| :------ | ------: | ------: |
| [polemma-large]() | 92.61  | 
| [polemma-base]() | 91.34  |
| [polemma-small]()| 88.46 |


## Citation

If you use the model, please cite the following paper:

TBD

### Framework versions

- Transformers 4.26.0
- Pytorch 1.13.1.post200
- Datasets 2.9.0
- Tokenizers 0.13.2