somosnlp-hackathon-2023
/

baizemocracy-lora-7B-cfqa-conv

Text2Text Generation

question answering

Retrieval Augmented Generation

Inference Endpoints

Model card Files Files and versions Community

jorge-henao commited on Apr 10, 2023

Commit

be83a0d

•

1 Parent(s): c7e7a9d

Update README.md

Files changed (1) hide show

README.md +26 -0

README.md CHANGED Viewed

@@ -9,6 +9,12 @@ license: apache-2.0
 ## What's baizemocracy-lora-7B-cfqa model?
 This model is an open-source chat model fine-tuned with [LoRA](https://github.com/microsoft/LoRA) inspired by [Baize project](https://github.com/project-baize/baize-chatbot/tree/main/). It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish.
 - **Developed by:**
 - 🇨🇴 [Jorge Henao](https://huggingface.co/jorge-henao)
@@ -33,4 +39,24 @@ This model is an open-source chat model fine-tuned with [LoRA](https://github.co
 - [Alpacaca chat Dialogs](https://github.com/project-baize/baize)
 - [Medical chat Dialogs](https://github.com/project-baize/baize)
 More details can be found in the Ask2Democracy [GitHub](https://github.com/jorge-henao/ask2democracy)

 ## What's baizemocracy-lora-7B-cfqa model?
 This model is an open-source chat model fine-tuned with [LoRA](https://github.com/microsoft/LoRA) inspired by [Baize project](https://github.com/project-baize/baize-chatbot/tree/main/). It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish.
+Two major experiments models was performed during the Hackathon Somos NLP 2023: A conversational style focused model and a contex focused style model.
+This model is focused in a more conversational way of asking questions. See Pre-proccessing dataset section.
+There is other model variation more focused on augmented retrieval based on context [Baizemocracy-contextfocused](https://github.com/project-baize/baize-chatbot/tree/main/).
+Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases.
 - **Developed by:**
 - 🇨🇴 [Jorge Henao](https://huggingface.co/jorge-henao)
 - [Alpacaca chat Dialogs](https://github.com/project-baize/baize)
 - [Medical chat Dialogs](https://github.com/project-baize/baize)
+- ### About pre-processing
+<code>
+def format_instruction_without_context(example):
+  example["topic"] = example['input']
+  input = "La conversación entre un humano y un asistente de IA."
+  input += "\n[|Human|] "+example['input']
+  input += "\n[|AI|] "+example["output"]
+  if len(example["topics"])>0:
+    topics = ", ".join(example["topics"])
+    input += "\n[|Human|] "+"¿En cuáles tópicos clasificarías su respuesta?"
+    input += "\n[|AI|] "+f"Aquí una lista de tópicos: {topics}."
+    example["topic"] += f" ({topics})"
+  example["input"] = input
+  return example
+data_reforma_salud_cfqa_without_context = data_reforma_salud_cfqa.map(format_instruction_without_context, remove_columns=['output','topics','instruction'])
+data_reforma_salud_cqa_withou
+</code>
 More details can be found in the Ask2Democracy [GitHub](https://github.com/jorge-henao/ask2democracy)