somosnlp-hackathon-2022
/

poem-gen-spanish-t5-small

@@ -1,5 +1,6 @@
 ---
 license: mit
 tags:
 - generated_from_trainer
 model-index:
@@ -7,26 +8,74 @@ model-index:
   results: []
 ---
-<!-- This model card has been generated automatically according to the information the Trainer had access to. You
-should probably proofread and complete it, then remove this comment. -->
 # poem-gen-spanish-t5-small
-This model is a fine-tuned version of [hackathon-pln-es/poem-gen-spanish-t5-small](https://huggingface.co/hackathon-pln-es/poem-gen-spanish-t5-small) on the None dataset.
 It achieves the following results on the evaluation set:
-- Loss: 2.8723
 ## Model description
-More information needed
-## Intended uses & limitations
-More information needed
 ## Training and evaluation data
-More information needed
 ## Training procedure

 ---
 license: mit
+language: es
 tags:
 - generated_from_trainer
 model-index:
   results: []
 ---
 # poem-gen-spanish-t5-small
+This model is a fine-tuned version of [flax-community/spanish-t5-small](https://huggingface.co/flax-community/spanish-t5-small) on the [Spanish Poetry Dataset](https://www.kaggle.com/andreamorgar/spanish-poetry-dataset/version/1) dataset.
+The model was created during the [First Spanish Hackathon](https://somosnlp.org/hackathon) organized by [Somos NLP](https://somosnlp.org/).
+The team who participated was composed by:
+- 🇨🇺 [Alberto Carmona Barthelemy](https://huggingface.co/milyiyo)
+- 🇨🇴 [Jorge Henao](https://huggingface.co/jorge-henao)
+- 🇪🇸 [Andrea Morales Garzón](https://huggingface.co/andreamorgar)
+- 🇮🇳 [Drishti Sharma](https://huggingface.co/DrishtiSharma)
 It achieves the following results on the evaluation set:
+- Loss: 2.8707
+- Perplexity: 17.65
 ## Model description
+The model was trained to generate spanish poems attending to some parameters like style, sentiment, words to include and starting phrase.
+Example:
+```
+poema:
+  estilo: Pablo Neruda &&
+  sentimiento: positivo &&
+  palabras: cielo, luna, mar &&
+  texto: Todos fueron a verle pasar
+```
+### How to use
+You can use this model directly with a pipeline for masked language modeling:
+```python
+from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
+model_name = 'hackathon-pln-es/poem-gen-spanish-t5-small'
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
+author, sentiment, word, start_text = 'Pablo Neruda', 'positivo', 'cielo', 'Todos fueron a la plaza'
+input_text = f"""poema: estilo: {author} && sentimiento: {sentiment} && palabras: {word} && texto: {start_text} """
+inputs = tokenizer(input_text, return_tensors="pt")
+outputs = model.generate(inputs["input_ids"],
+                         do_sample = True,
+                         max_length = 30,
+                         repetition_penalty = 20.0,
+                         top_k = 50,
+                         top_p = 0.92)
+detok_outputs = [tokenizer.decode(x, skip_special_tokens=True) for x in outputs]
+res = detok_outputs[0]
+```
 ## Training and evaluation data
+The original dataset has the columns `author`, `content` and `title`.
+For each poem we generate new examples:
+- content: *line_i* , generated: *line_i+1*
+- content: *concatenate(line_i, line_i+1)* , generated: *line_i+2*
+- content: *concatenate(line_i, line_i+1, line_i+2)* , generated: *line_i+3*
+The resulting dataset has the columns `author`, `content`, `title` and `generated`.
+For each example we compute the sentiment of the generated column and the nouns. In the case of sentiment, we used the model `mrm8488/electricidad-small-finetuned-restaurant-sentiment-analysis` and for nouns extraction we used spaCy.
 ## Training procedure