SlavicNLP
/

slavicner-lemma-single-out-large

Text2Text Generation

Inference Endpoints

Model card Files Files and versions Community

slavicner-lemma-single-out-large / README.md

czuk's picture

Update README.md

dbc7041 verified 6 months ago

|

1.75 kB

	---
	language:
	- multilingual
	- pl
	- ru
	- uk
	- bg
	- cs
	- sl
	datasets:
	- SlavicNER
	license: apache-2.0
	library_name: transformers
	pipeline_tag: text2text-generation
	tags:
	- lemmatization
	widget:
	- text: "pl:Polsce"
	- text: "cs:Velké Británii"
	- text: "bg:българите"
	- text: "ru:Великобританию"
	- text: "sl:evropske komisije"
	- text: "uk:Європейського агентства лікарських засобів"
	---

	# Model description

	This is a baseline model for named entity lemmatization trained on the single-out topic split of the
	[SlavicNER corpus](https://github.com/SlavicNLP/SlavicNER).


	# Resources and Technical Documentation

	- Paper: [Cross-lingual Named Entity Corpus for Slavic Languages](https://arxiv.org/pdf/2404.00482), to appear in LREC-COLING 2024.
	- Annotation guidelines: https://arxiv.org/pdf/2404.00482
	- SlavicNER Corpus: https://github.com/SlavicNLP/SlavicNER


	# Evaluation

	Will appear soon


	# Usage

	You can use this model directly with a pipeline for text2text generation:

	```python
	from transformers import pipeline

	model_name = "SlavicNLP/slavicner-lemma-single-out-large"
	pipe = pipeline("text2text-generation", model_name)

	texts = ["pl:Polsce", "cs:Velké Británii", "bg:българите", "ru:Великобританию", "sl:evropske komisije",
	"uk:Європейського агентства лікарських засобів"]

	outputs = pipe(texts)

	lemmas = [o['generated_text'] for o in outputs]
	print(lemmas)
	# ['Polska', 'Velká Británie', 'българи', 'Великобритания', 'evropska komisija', 'Європейське агентство лікарських засобів']
	```

	# Citation

	Will appear soon