Pre-trained checkpoints for speech representation in Japanese

The models in this repository were pre-trained via self-supervised learning (SSL) for speech representation. The SSL models were built on the fairseq toolkit.

  • wav2vec2_base_csj.pt
    • fairseq checkpoint of wav2vec2.0 model with Base architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
  • wav2vec2_base_csj_hf
    • converted version of wav2vec2_base_csj.pt compatible with the interface of Hugging Face by using this tool
  • hubert_base_csj.pt
    • fairseq checkpoint of HuBERT model with Base architecture pre-trained on 16kHz sampled speech data of Corpus of Spontaneous Japanese (CSJ)
  • hubert_base_csj_hf
    • converted version of hubert_base_csj.pt compatible with the interface of Hugging Face by using this tool

If you find this helpful, please consider citing the following paper.

@INPROCEEDINGS{ashihara_icassp23,
  author={Takanori Ashihara and Takafumi Moriya and Kohei Matsuura and Tomohiro Tanaka},
  title={Exploration of Language Dependency for Japanese Self-Supervised Speech Representation Models},
  booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2023}
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.