README.md · karolnowakowski/wav2vec2-large-xlsr-53-pretrain-ain at ff47b005c34ec7c458f80f09bd72d89d19a8fbfd

metadata

language:
  - multilingual
  - ain
license: apache-2.0

Wav2Vec2-Large-XLSR-53 pretrained on Ainu language data

This is a wav2vec-large-xlsr-53 model adapted for the Ainu language by performing continued pretraining for 100k steps on 234 hours of speech data in Hokkaido Ainu and Sakhalin Ainu. For details, please refer to the paper (see below).

Citation

When using the model please cite the following paper (in press):

@article{nowakowski2022,
  title={Adapting Multilingual Speech Representation Model for a New, Underresourced Language through Multilingual Fine-tuning and Continued Pretraining},
  author={Nowakowski, Karol and Ptaszynski, Michal and Murasaki, Kyoko and Nieuważny, Jagna},
  year={2022},
  journal={Information Processing & Management}
}