jorge-henao commited on
Commit
be83a0d
1 Parent(s): c7e7a9d

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +26 -0
README.md CHANGED
@@ -9,6 +9,12 @@ license: apache-2.0
9
  ## What's baizemocracy-lora-7B-cfqa model?
10
 
11
  This model is an open-source chat model fine-tuned with [LoRA](https://github.com/microsoft/LoRA) inspired by [Baize project](https://github.com/project-baize/baize-chatbot/tree/main/). It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish.
 
 
 
 
 
 
12
 
13
  - **Developed by:**
14
  - 🇨🇴 [Jorge Henao](https://huggingface.co/jorge-henao)
@@ -33,4 +39,24 @@ This model is an open-source chat model fine-tuned with [LoRA](https://github.co
33
  - [Alpacaca chat Dialogs](https://github.com/project-baize/baize)
34
  - [Medical chat Dialogs](https://github.com/project-baize/baize)
35
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
36
  More details can be found in the Ask2Democracy [GitHub](https://github.com/jorge-henao/ask2democracy)
 
9
  ## What's baizemocracy-lora-7B-cfqa model?
10
 
11
  This model is an open-source chat model fine-tuned with [LoRA](https://github.com/microsoft/LoRA) inspired by [Baize project](https://github.com/project-baize/baize-chatbot/tree/main/). It was trained with the Baize datasets and the ask2democracy-cfqa-salud-pension dataset, wich contains almost 4k instructions to answers questions based on a context relevant to citizen concerns and public debate in spanish.
12
+ Two major experiments models was performed during the Hackathon Somos NLP 2023: A conversational style focused model and a contex focused style model.
13
+ This model is focused in a more conversational way of asking questions. See Pre-proccessing dataset section.
14
+ There is other model variation more focused on augmented retrieval based on context [Baizemocracy-contextfocused](https://github.com/project-baize/baize-chatbot/tree/main/).
15
+
16
+ Testing is a work in progress, we decide to share both model variations with community in order to invovle more people experimenting what it works better and find other possible use cases.
17
+
18
 
19
  - **Developed by:**
20
  - 🇨🇴 [Jorge Henao](https://huggingface.co/jorge-henao)
 
39
  - [Alpacaca chat Dialogs](https://github.com/project-baize/baize)
40
  - [Medical chat Dialogs](https://github.com/project-baize/baize)
41
 
42
+ - ### About pre-processing
43
+
44
+ <code>
45
+ def format_instruction_without_context(example):
46
+ example["topic"] = example['input']
47
+ input = "La conversación entre un humano y un asistente de IA."
48
+ input += "\n[|Human|] "+example['input']
49
+ input += "\n[|AI|] "+example["output"]
50
+ if len(example["topics"])>0:
51
+ topics = ", ".join(example["topics"])
52
+ input += "\n[|Human|] "+"¿En cuáles tópicos clasificarías su respuesta?"
53
+ input += "\n[|AI|] "+f"Aquí una lista de tópicos: {topics}."
54
+ example["topic"] += f" ({topics})"
55
+ example["input"] = input
56
+ return example
57
+ data_reforma_salud_cfqa_without_context = data_reforma_salud_cfqa.map(format_instruction_without_context, remove_columns=['output','topics','instruction'])
58
+ data_reforma_salud_cqa_withou
59
+ </code>
60
+
61
+
62
  More details can be found in the Ask2Democracy [GitHub](https://github.com/jorge-henao/ask2democracy)