mtspeech
/

MooER-MTL-5K

@@ -1,48 +1,258 @@
-# MooER (摩耳): LLM-based Speech Recognition and Translation Models from Moore Threads
-## 📖 Introduction
-We introduce **MooER (摩耳)**: LLM-based speech recognition and translation models developed by Moore Threads. With the *MooER* framework, you can transcribe the speech into text (speech recognition or, ASR), and  translate it into other languages (speech translation or, AST) in a end-to-end manner. The performance of *MooER* is demonstrated in the subsequent section, along with our insights into model configurations, training strategies, and more, provided in our [technical report](https://arxiv.org/abs/2408.05101).
-<br>
-<p align="center">
-    <img src="assets/framework.png" width="600"/>
-<p>
-<br>
-## 🏁 Getting Started
-**For the performance of MooER and the usage of the model files, please visit our [GitHub](https://github.com/MooreThreads/MooER)**
-## 🧾 License
-Please see the [LICENSE](LICENSE).
-## 💖 Citation
-If you find MooER useful for your research, please 🌟 this repo and cite our work using the following BibTeX:
-```bibtex
-@article{liang2024mooer,
-  title   = {MooER: LLM-based Speech Recognition and Translation Models from Moore Threads},
-  author  = {Zhenlin Liang, Junhao Xu, Yi Liu, Yichao Hu, Jian Li, Yajun Zheng, Meng Cai, Hua Wang},
-  journal = {arXiv preprint arXiv:2408.05101},
-  url     = {https://arxiv.org/abs/2408.05101},
-  year    = {2024}
-}
-```
-## 📧 Contact
-If you encouter any problems, feel free to create a discussion.
-Moore Threads Website: **https://www.mthreads.com/**
-<br>
-<p align="left">
-    <img src="assets/MTLogo.png" width="300"/>
-<p>
 <br>

+---
+license: mit
+language:
+- zh
+- en
+metrics:
+- cer
+- bleu
+tags:
+- asr
+- automatic-speech-recognition
+- automatic-speech-translation
+- speech-translation
+- speech-recognition
+---
+# MooER (摩耳): LLM-based Speech Recognition and Translation Models from Moore Threads
+## 📖 Introduction
+We introduce **MooER (摩耳)**: LLM-based speech recognition and translation models developed by Moore Threads. With the *MooER* framework, you can transcribe the speech into text (speech recognition or, ASR), and  translate it into other languages (speech translation or, AST) in a end-to-end manner. The performance of *MooER* is demonstrated in the subsequent section, along with our insights into model configurations, training strategies, and more, provided in our [technical report](https://arxiv.org/abs/2408.05101).
+For the usage of the model files, please refer to our [GitHub](https://github.com/MooreThreads/MooER)
+<br>
+<p align="center">
+    <img src="assets/framework.png" width="600"/>
+<p>
+<br>
+## 🥊 Evaluation Results
+We demonstrate the training data and the evaluation results below. For more comprehensive information, please refer to our [report](https://arxiv.org/pdf/2408.05101).
+### Training data
+We utilize 5k hours of data (MT5K) to train our basic *MooER-5K* model. The data sources include:
+| Dataset          | Duration          |
+|---------------|---------------|
+| aishell2 | 137h          |
+| librispeech | 131h      |
+| multi_cn | 100h          |
+| wenetspeech  | 1361h     |
+| in-house data | 3274h  |
+Note that, data from the open-source datasets were randomly selected from the full training set. The in-house data, collected internally without text, were transcribed using a third-party ASR service.
+Since all the above datasets were originally designed only for the speech recognition task, no translation results are available. To train our speech translation model, we used a third-party translation service to generate pseudo-labels. No data filtering techniques were applied.
+At this moment, we are also developing a new model trained with 80K hours of data.
+### Speech Recognition
+The performance of speech recognition is evaluated using WER/CER.
+<table>
+  <tr>
+    <th>Language</th>
+    <th>Testset</th>
+    <th>Paraformer-large</th>
+    <th>SenseVoice-small</th>
+    <th>Qwen-audio</th>
+    <th>Whisper-large-v3</th>
+    <th>SeamlessM4T-v2</th>
+    <th>MooER-5K</th>
+    <th>MooER-80K</th>
+  </tr>
+  <tr>
+    <td rowspan="7">Chinese</td>
+    <td>aishell1</td>
+    <td>1.93</td>
+    <td>3.03</td>
+    <td>1.43</td>
+    <td>7.86</td>
+    <td>4.09</td>
+    <td>1.93</td>
+    <td>1.25</td>
+  </tr>
+  <tr>
+    <td>aishell2_ios</td>
+    <td>2.85</td>
+    <td>3.79</td>
+    <td>3.57</td>
+    <td>5.38</td>
+    <td>4.81</td>
+    <td>3.17</td>
+    <td>2.67</td>
+  </tr>
+  <tr>
+    <td>test_magicdata</td>
+    <td>3.66</td>
+    <td>3.81</td>
+    <td>5.31</td>
+    <td>8.36</td>
+    <td>9.69</td>
+    <td>3.48</td>
+    <td>2.52</td>
+  </tr>
+  <tr>
+    <td>test_thchs</td>
+    <td>3.99</td>
+    <td>5.17</td>
+    <td>4.86</td>
+    <td>9.06</td>
+    <td>7.14</td>
+    <td>4.11</td>
+    <td>3.14</td>
+  </tr>
+  <tr>
+    <td>fleurs cmn_dev</td>
+    <td>5.56</td>
+    <td>6.39</td>
+    <td>10.54</td>
+    <td>4.54</td>
+    <td>7.12</td>
+    <td>5.81</td>
+    <td>5.23</td>
+  </tr>
+  <tr>
+    <td>fleurs cmn_test</td>
+    <td>6.92</td>
+    <td>7.36</td>
+    <td>11.07</td>
+    <td>5.24</td>
+    <td>7.66</td>
+    <td>6.77</td>
+    <td>6.18</td>
+  </tr>
+  <tr>
+    <td>average</td>
+    <td><strong>4.15</strong></td>
+    <td><strong>4.93</strong></td>
+    <td><strong>6.13</strong></td>
+    <td><strong>6.74</strong></td>
+    <td><strong>6.75</strong></td>
+    <td><strong>4.21</strong></td>
+    <td><strong>3.50</strong></td>
+  </tr>
+  <tr>
+    <td rowspan="7">English</td>
+    <td>librispeech test_clean</td>
+    <td>14.15</td>
+    <td>4.07</td>
+    <td>2.15</td>
+    <td>3.42</td>
+    <td>2.77</td>
+    <td>7.78</td>
+    <td>4.11</td>
+  </tr>
+  <tr>
+    <td>librispeech test_other</td>
+    <td>22.99</td>
+    <td>8.26</td>
+    <td>4.68</td>
+    <td>5.62</td>
+    <td>5.25</td>
+    <td>15.25</td>
+    <td>9.99</td>
+  </tr>
+  <tr>
+    <td>fleurs eng_dev</td>
+    <td>24.93</td>
+    <td>12.92</td>
+    <td>22.53</td>
+    <td>11.63</td>
+    <td>11.36</td>
+    <td>18.89</td>
+    <td>13.32</td>
+  </tr>
+  <tr>
+    <td>fleurs eng_test</td>
+    <td>26.81</td>
+    <td>13.41</td>
+    <td>22.51</td>
+    <td>12.57</td>
+    <td>11.82</td>
+    <td>20.41</td>
+    <td>14.97</td>
+  </tr>
+  <tr>
+    <td>gigaspeech dev</td>
+    <td>24.23</td>
+    <td>19.44</td>
+    <td>12.96</td>
+    <td>19.18</td>
+    <td>28.01</td>
+    <td>23.46</td>
+    <td>16.92</td>
+  </tr>
+  <tr>
+    <td>gigaspeech test</td>
+    <td>23.07</td>
+    <td>16.65</td>
+    <td>13.26</td>
+    <td>22.34</td>
+    <td>28.65</td>
+    <td>22.09</td>
+    <td>16.64</td>
+  </tr>
+  <tr>
+    <td>average</td>
+    <td><strong>22.70</strong></td>
+    <td><strong>12.46</strong></td>
+    <td><strong>13.02</strong></td>
+    <td><strong>12.46</strong></td>
+    <td><strong>14.64</strong></td>
+    <td><strong>17.98</strong></td>
+    <td><strong>12.66</strong></td>
+  </tr>
+</table>
+### Speech Translation (zh -> en)
+For speech translation, the performanced is evaluated using BLEU score.
+| Testset | Speech-LLaMA | Whisper-large-v3 | Qwen-audio | Qwen2-audio | SeamlessM4T-v2 | MooER-5K | MooER-5K-MTL |
+|--------|-------------|-------------------|------------|-------------|-----------------|--------|--------------|
+|CoVoST1 zh2en | - |  13.5 | 13.5 | - | 25.3 | - | **30.2** |
+|CoVoST2 zh2en | 12.3 | 12.2 | 15.7 | 24.4 | 22.2 | 23.4 | **25.2** |
+|CCMT2019 dev | -  | 15.9 | 12.0 | - | 14.8 | - | **19.6** |
+## 🏁 Getting Started
+Please visit our [GitHub](https://github.com/MooreThreads/MooER) for the setup and usage.
+## 🧾 License
+Please see the [LICENSE](LICENSE).
+## 💖 Citation
+If you find MooER useful for your research, please 🌟 this repo and cite our work using the following BibTeX:
+```bibtex
+@article{liang2024mooer,
+  title   = {MooER: LLM-based Speech Recognition and Translation Models from Moore Threads},
+  author  = {Zhenlin Liang, Junhao Xu, Yi Liu, Yichao Hu, Jian Li, Yajun Zheng, Meng Cai, Hua Wang},
+  journal = {arXiv preprint arXiv:2408.05101},
+  url     = {https://arxiv.org/abs/2408.05101},
+  year    = {2024}
+}
+```
+## 📧 Contact
+If you encouter any problems, feel free to create a discussion.
+Moore Threads Website: **https://www.mthreads.com/**
+<br>
+<p align="left">
+    <img src="assets/MTLogo.png" width="300"/>
+<p>
 <br>