Training Ultravox

#4
by AniBirage - opened

Hi
I wanted to know how can I train Ultravox on Hindi dataset which is present locally on my device.

Hi AniBirage,

We wrote a brief instruction on training on your own data here: https://github.com/fixie-ai/ultravox/?tab=readme-ov-file#use-cases-for-training-ultravox

We mainly use datasets uploaded to huggingface for model training. You can also use local datasets as long as they are supported by the huggingface datasets library. Let us know if you run into issues as we try to improve the documentation.

Hi AniBirage,

We wrote a brief instruction on training on your own data here: https://github.com/fixie-ai/ultravox/?tab=readme-ov-file#use-cases-for-training-ultravox

We mainly use datasets uploaded to huggingface for model training. You can also use local datasets as long as they are supported by the huggingface datasets library. Let us know if you run into issues as we try to improve the documentation.

I want to train Ultravox using my own dataset, which is in Hindi. I have converted the data to .parquet format with fields for audio, sentence, and continuation. I believe I need a script to accomplish this. Could you explain what type of script I would need (perhaps with an example) and where it should be saved while training Ultravox with my local data?

Sign up or log in to comment