--- license: other language: - ja library_name: fairseq --- # Pre-trained checkpoints for speech representation in Japanese The models in this repository were pre-trained via self-supervised learning (SSL) for speech representation. The SSL models were built on the [fairseq](https://github.com/facebookresearch/fairseq) toolkit. - `wav2vec2_base_csj.pt` - fairseq checkpoint of wav2vec2.0 model with *Base* architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ) - `wav2vec2_base_csj_hf` - converted version of `wav2vec2_base_csj.pt` compatible with the interface of Hugging Face by using [this tool](https://github.com/huggingface/transformers/blob/main/src/transformers/models/wav2vec2/convert_wav2vec2_original_pytorch_checkpoint_to_pytorch.py) If you find this helpful, please consider citing the following paper. ```text @INPROCEEDINGS{ashihara_icassp23, author={Takanori Ashihara and Takafumi Moriya and Kohei Matsuura and Tomohiro Tanaka}, title={Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models}, booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, year={2023} } ```