File size: 1,971 Bytes
0d1a3a2 0b119a3 0d1a3a2 2f6cbbd 0d1a3a2 2f6cbbd 0d1a3a2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
---
language:
- en
license: apache-2.0
---
# mFLAG
mFLAG is a sequence-to-sequence model for multi-figurative language generation. It was introduced in the paper [Multi-Figurative Language Generation](https://arxiv.org/abs/2209.01835) paper by [Huiyuan Lai](https://laihuiyuan.github.io/) and [Malvina Nissim](https://scholar.google.nl/citations?user=hnTpEOAAAAAJ&hl=en).
# Model description
mFLAG is a sequence-to-sequence model for multi-figurative language generation. It is trained by employing a scheme for multi-figurative language pre-training on top of BART, and a mechanism for injecting the target figurative information into the encoder; this enables the generation of text with the target figurative form from another figurative form without parallel figurative-figurative sentence pairs.
# How to use
```bash
git clone git@github.com:laihuiyuan/mFLAG.git
cd mFLAG
```
```python
from model import MultiFigurativeGeneration
from tokenization_mflag import MFlagTokenizerFast
tokenizer = MFlagTokenizerFast.from_pretrained('laihuiyuan/mFLAG')
model = MultiFigurativeGeneration.from_pretrained('laihuiyuan/mFLAG')
# hyperbole to sarcasm
inp_ids = tokenizer.encode("<hyperbole> I am not happy that he urged me to finish all the hardest tasks in the world", return_tensors="pt")
fig_ids = tokenizer.encode("<sarcasm>", add_special_tokens=False, return_tensors="pt")
outs = model.generate(input_ids=inp_ids[:, 1:], fig_ids=fig_ids, forced_bos_token_id=fig_ids.item(), num_beams=5, max_length=60,)
text = tokenizer.decode(outs[0, 2:].tolist(), skip_special_tokens=True, clean_up_tokenization_spaces=False)
```
# Citation Info
```BibTeX
@inproceedings{lai-etal-2022-multi,
title = "Multi-Figurative Language Generation",
author = "Lai, Huiyuan and Nissim, Malvina",
booktitle = "Proceedings of the 29th International Conference on Computational Linguistics",
month = October,
year = "2022",
address = "Gyeongju, Republic of korea",
}
``` |