uer
/

albert-base-chinese-cluecorpussmall

@@ -1,78 +1,78 @@
----
-language: All languages
-datasets: ISML datasets (80 thousands hours unlabeled data) + babel datasets (2 thousands unlabeled data)
-# Chinese W2v-conformer
-## Model description
-This is the set of Speech W2v-conformer model pre-trained by UER-py. You can download the model either from the [UER-py Github page](https://github.com/dbiir/UER-py/):
-## How to use
-You can use the model directly with a pipeline for speech recognition:
-```python
->>> from wenet.dataset.dataset import CollateFunc, AudioDataset
->>> from wenet.transformer.asr_model import ASRModel
->>> from wenet.transformer.encoder import ConformerEncoder
->>> from wenet.transformer.decoder import TransformerDecoder
->>> from wenet.transformer.ctc import CTC
->>> from wenet.utils.executor import Executor
->>> from wenet.utils.checkpoint import save_checkpoint, load_checkpoint
->>> encoder = ConformerEncoder(input_dim, **configs['encoder_conf'])
->>> decoder = TransformerDecoder(vocab_size, encoder.output_size(), **configs['decoder_conf'])
->>> ctc = CTC(vocab_size, encoder.output_size())
->>> with open(args.config, 'r') as fin: configs = yaml.load(fin)
->>> model = ASRModel(
-        vocab_size=vocab_size,
-        encoder=encoder,
-        decoder=decoder,
-        ctc=ctc,
-        **configs['model_conf'],
-    )
->>> infos = load_checkpoint(model, args.checkpoint)
-```
-## Training data
-ISML datasets (80 thousands hours unlabeled data) and babel datasets (2 thousands unlabeled data) are used as training data.
-## Training procedure
-The model is pre-trained by wav2vec2 (https://github.com/dbiir/UER-py/) on [Tencent Cloud](https://cloud.tencent.com/). We pre-train 70 epochs with a batch size of 128. We use the same hyper-parameters on different model sizes.
-The downstream models are finetuned:
-```
-Stage 1:
-```
-        python wenet/bin/train.py --gpu 0,1,2,3,4,5,6,7 \
-            --config $train_config \
-            --train_data train.data \
-            --cv_data dev.data \
-            ${checkpoint:+--checkpoint $checkpoint} \
-            --model_dir $dir \
-            --ddp.init_method $init_method \
-            --ddp.world_size 7 \
-            --ddp.dist_backend nccl \
-            --num_workers 2
-```
-### BibTeX entry and citation info
-```
-@article{baevski2020wav2vec,
-  title={wav2vec 2.0: A framework for self-supervised learning of speech representations},
-  author={Baevski, Alexei and Zhou, Henry and Mohamed, Abdelrahman and Auli, Michael},
-  journal={arXiv preprint arXiv:2006.11477},
-  year={2020}
-}
-@article{zhang2020pushing,
-  title={Pushing the limits of semi-supervised learning for automatic speech recognition},
-  author={Zhang, Yu and Qin, James and Park, Daniel S and Han, Wei and Chiu, Chung-Cheng and Pang, Ruoming and Le, Quoc V and Wu, Yonghui},
-  journal={arXiv preprint arXiv:2010.10504},
-  year={2020}
-}
-@article{zhang2021wenet,
-  title={WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit},
-  author={Zhang, Binbin and Wu, Di and Yang, Chao and Chen, Xiaoyu and Peng, Zhendong and Wang, Xiangming and Yao, Zhuoyuan and Wang, Xiong and Yu, Fan and Xie, Lei and others},
-  journal={arXiv preprint arXiv:2102.01547},
-  year={2021}
-}
-```
-[base]:https://huggingface.co/uer/albert-base-chinese-cluecorpussmall
 [large]:https://huggingface.co/uer/albert-large-chinese-cluecorpussmall

+---
+language: All languages
+datasets: ISML datasets (80 thousands hours unlabeled data) + babel datasets (2 thousands unlabeled data)
+# Chinese W2v-conformer
+## Model description
+This is the set of Speech W2v-conformer model pre-trained by UER-py. You can download the model either from the [UER-py Github page](https://github.com/dbiir/UER-py/):
+## How to use
+You can use the model directly with a pipeline for speech recognition:
+```python
+>>> from wenet.dataset.dataset import CollateFunc, AudioDataset
+>>> from wenet.transformer.asr_model import ASRModel
+>>> from wenet.transformer.encoder import ConformerEncoder
+>>> from wenet.transformer.decoder import TransformerDecoder
+>>> from wenet.transformer.ctc import CTC
+>>> from wenet.utils.executor import Executor
+>>> from wenet.utils.checkpoint import save_checkpoint, load_checkpoint
+>>> encoder = ConformerEncoder(input_dim, **configs['encoder_conf'])
+>>> decoder = TransformerDecoder(vocab_size, encoder.output_size(), **configs['decoder_conf'])
+>>> ctc = CTC(vocab_size, encoder.output_size())
+>>> with open(args.config, 'r') as fin: configs = yaml.load(fin)
+>>> model = ASRModel(
+        vocab_size=vocab_size,
+        encoder=encoder,
+        decoder=decoder,
+        ctc=ctc,
+        **configs['model_conf'],
+    )
+>>> infos = load_checkpoint(model, args.checkpoint)
+```
+## Training data
+ISML datasets (80 thousands hours unlabeled data) and babel datasets (2 thousands unlabeled data) are used as training data.
+## Training procedure
+The model is pre-trained by wav2vec2 (https://github.com/dbiir/UER-py/) on [Tencent Cloud](https://cloud.tencent.com/). We pre-train 70 epochs with a batch size of 128. We use the same hyper-parameters on different model sizes.
+The downstream models are finetuned:
+```
+Stage 1:
+```
+        python wenet/bin/train.py --gpu 0,1,2,3,4,5,6,7 \
+            --config $train_config \
+            --train_data train.data \
+            --cv_data dev.data \
+            ${checkpoint:+--checkpoint $checkpoint} \
+            --model_dir $dir \
+            --ddp.init_method $init_method \
+            --ddp.world_size 7 \
+            --ddp.dist_backend nccl \
+            --num_workers 2
+```
+### BibTeX entry and citation info
+```
+@article{baevski2020wav2vec,
+  title={wav2vec 2.0: A framework for self-supervised learning of speech representations},
+  author={Baevski, Alexei and Zhou, Henry and Mohamed, Abdelrahman and Auli, Michael},
+  journal={arXiv preprint arXiv:2006.11477},
+  year={2020}
+}
+@article{zhang2020pushing,
+  title={Pushing the limits of semi-supervised learning for automatic speech recognition},
+  author={Zhang, Yu and Qin, James and Park, Daniel S and Han, Wei and Chiu, Chung-Cheng and Pang, Ruoming and Le, Quoc V and Wu, Yonghui},
+  journal={arXiv preprint arXiv:2010.10504},
+  year={2020}
+}
+@article{zhang2021wenet,
+  title={WeNet: Production First and Production Ready End-to-End Speech Recognition Toolkit},
+  author={Zhang, Binbin and Wu, Di and Yang, Chao and Chen, Xiaoyu and Peng, Zhendong and Wang, Xiangming and Yao, Zhuoyuan and Wang, Xiong and Yu, Fan and Xie, Lei and others},
+  journal={arXiv preprint arXiv:2102.01547},
+  year={2021}
+}
+```
+[base]:https://huggingface.co/uer/albert-base-chinese-cluecorpussmall
 [large]:https://huggingface.co/uer/albert-large-chinese-cluecorpussmall