espnet
/

owsm_ctc_v3.2_ft_1B

Automatic Speech Recognition

speech-translation

language-identification

Model card Files Files and versions Community

pyf98 commited on Feb 6

Commit

3719821

·

verified ·

1 Parent(s): 7c2420b

Update README.md

Files changed (1) hide show

README.md +53 -1

README.md CHANGED Viewed

@@ -152,4 +152,56 @@ utt4 AND CONCENTRATE ON PROPERTY MANAGEMENT
 segments = aligner(speech, text)
 print(segments)
-```

 segments = aligner(speech, text)
 print(segments)
+```
+## Citations
+#### OWSM-CTC
+```BibTex
+@inproceedings{owsm-ctc,
+    title = "{OWSM}-{CTC}: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification",
+    author = "Peng, Yifan  and
+      Sudo, Yui  and
+      Shakeel, Muhammad  and
+      Watanabe, Shinji",
+    booktitle = "Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL)",
+    year = "2024",
+    month= {8},
+    url = "https://aclanthology.org/2024.acl-long.549",
+}
+```
+#### OWSM v3.1 and v3.2
+```BibTex
+@inproceedings{owsm-v32,
+  title={On the Effects of Heterogeneous Data Sources on Speech-to-Text Foundation Models},
+  author={Jinchuan Tian and Yifan Peng and William Chen and Kwanghee Choi and Karen Livescu and Shinji Watanabe},
+  booktitle={Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH)},
+  year={2024},
+  month={9},
+  pdf="https://arxiv.org/pdf/2406.09282"
+}
+@inproceedings{owsm-v31,
+  title={{OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer}},
+  author={Yifan Peng and Jinchuan Tian and William Chen and Siddhant Arora and Brian Yan and Yui Sudo and Muhammad Shakeel and Kwanghee Choi and Jiatong Shi and Xuankai Chang and Jee-weon Jung and Shinji Watanabe},
+  booktitle={Proceedings of the Annual Conference of the International Speech Communication Association (INTERSPEECH)},
+  year={2024},
+  month={9},
+  pdf="https://arxiv.org/pdf/2401.16658",
+}
+```
+#### Initial OWSM (v1, v2, v3)
+```BibTex
+@inproceedings{owsm,
+  title={Reproducing Whisper-Style Training Using An Open-Source Toolkit And Publicly Available Data},
+  author={Yifan Peng and Jinchuan Tian and Brian Yan and Dan Berrebbi and Xuankai Chang and Xinjian Li and Jiatong Shi and Siddhant Arora and William Chen and Roshan Sharma and Wangyou Zhang and Yui Sudo and Muhammad Shakeel and Jee-weon Jung and Soumi Maiti and Shinji Watanabe},
+  booktitle={Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)},
+  year={2023},
+  month={12},
+  pdf="https://arxiv.org/pdf/2309.13876",
+}
+```