What specific dataset did you use?

#4
by Alicorn-Max - opened

Hi, I was curious about the specific dataset used for this. I would like to train a model like this, but with a newer dataset. Thanks!

Hello, Max!

The datasets I used to train the Conversational_Spanish_GPT model are available in the Microsoft Bot Framework Tools repository. In this repository, you’ll find numerous high-quality datasets in various languages, including one that I used:
https://qnamakerstore.blob.core.windows.net/qnamakerdata/editorial/spanish/qna_chitchat_professional.tsv

However, I can’t point to just one dataset since I relied on several of them for training. Additionally, to adapt them to fine-tuning requirements, it was necessary to edit, convert, clean, and modify the files, which was a challenging process but definitely worth it.

I encourage you to train your own model, and I appreciate your interest in mine. Good luck!

ostorc changed discussion status to closed

Sign up or log in to comment