What specific dataset did you use?
Hi, I was curious about the specific dataset used for this. I would like to train a model like this, but with a newer dataset. Thanks!
Hello, Max!
The datasets I used to train the Conversational_Spanish_GPT model are available in the Microsoft Bot Framework Tools repository. In this repository, you’ll find numerous high-quality datasets in various languages, including one that I used:
https://qnamakerstore.blob.core.windows.net/qnamakerdata/editorial/spanish/qna_chitchat_professional.tsv
However, I can’t point to just one dataset since I relied on several of them for training. Additionally, to adapt them to fine-tuning requirements, it was necessary to edit, convert, clean, and modify the files, which was a challenging process but definitely worth it.
I encourage you to train your own model, and I appreciate your interest in mine. Good luck!