Could you let us know how you fine-tuned your model ?

#1
by yilmazay - opened

Hi,
Thanks for sharing your fine-tuned model and its performance scores.
As you said in the read-me, your model maybe biased for French.
Therefore, to get a good result in another language, it needs to be finetuned with that language's dataset.
I will appreciate it if you could share you fine tuning method or even your script.
Thanks in advance
Yılmaz A.

I was wondering the exact same thing, I' dlike to adapt it to french canadian, so that would be a great fine tune job from your model!
Thanks
LP

La Javaness org

Hi,

Thank you for your interest. The model has been trained mainly with Hugging Face's framework, so it should be quite straightforward to fine-tune the model using datasets and transformers libraries. Unfortunately, I can not share directly my training script, and anyway, hugging face packages had several updates during the last months so my script might be already obsolete.

However, I can suggest going through HF documentation for audio classification, there is pretty much all you need to build and train your own model on your data :

Overall, I used a quite simple training process with standard hyper-parameters for this type of tasks. The important things to keep in mind are :

  • Audio are mp3 files sampled at 16kHz (standard for voice processing) with varying length roughly from 1 to 5 seconds.
  • The model is trained as a multi-label classifier so I used Binary Cross Entropy loss (specifically BCEWithLogitsLoss from torch.)
  • If you are finetuning on a different language than French, I suggest not using directly my model and preferably fine tune a wav2vec2 model trained for the target language.
  • Main training hyper-parameters :
    - learning_rate = 1e-4
    - batch_size = 16
    - train_epoch = 40
    - weight_decay = 0.02 (to limit overfitting)

I hope it answers your question somehow. If you have additional specific questions let me know, I will try to answer.

Jules

Sign up or log in to comment