Bulgarian language poetry generation

Pretrained model using causal language modeling (CLM) objective based on GPT-2.
Developed by Radostin Cholakov as a part of the AzBuki.ML initiatives.

How to use?

>>> from transformers import AutoModel, AutoTokenizer
>>>
>>> model_id = "radi-cho/poetry-bg"
>>> tokenizer = AutoTokenizer.from_pretrained(model_id)
>>> model = AutoModel.from_pretrained(model_id, trust_remote_code=True)
>>>
>>> input_ids = tokenizer.encode(
>>>     "[HED]Суетата на живота[NEL][BDY]", 
>>>     add_special_tokens=False, 
>>>     return_tensors='pt')
>>>
>>> output_ids = model.generate(
>>>     input_ids, 
>>>     do_sample=True, 
>>>     max_length=250,
>>>     top_p=0.98,
>>>     top_k=0,
>>>     pad_token_id=2,
>>>     eos_token_id=50258)
>>>
>>> output = tokenizer.decode(output_ids[0])
>>>
>>> output = output.replace('[NEL]', '\n')
>>> output = output.replace('[BDY]', '\n')
>>> output = output.replace('[HED]', '')
>>> output = output.replace('[SEP]', '')
>>>
>>> print(output)
Суетата на живота

Да страдам ли?
Да страдам ли за това?
Не, не за това, че умирам...
Но само за това,
че миговете ми са рани.

Аз съм сам и търся утеха.

Custom Tokens

We introduced 3 custom tokens in the tokenizer - [NEL], [BDY], [HED]

  • [HED] denotes where the title of the poem begins;
  • [BDY] denotes where the body of the poem begins;
  • [NEL] marks the end of a verse and should be decoded as a new line;

[SEP] (with id 50258) is the end of sequence token.

Credits

Downloads last month
117
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model authors have turned it off explicitly.