Update README.md
Browse files
README.md
CHANGED
@@ -11,15 +11,28 @@ datasets:
|
|
11 |
license: cc-by-4.0
|
12 |
---
|
13 |
|
14 |
-
[OWSM-CTC](https://arxiv.org/abs/2402.12654) is an encoder-only speech foundation model based on multi-task self-conditioned CTC.
|
15 |
It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous [encoder-decoder OWSM](https://arxiv.org/abs/2401.16658).
|
16 |
|
17 |
-
Due to time constraint, the model used in the paper was trained for 40 "epochs". The new model trained for 45 "epochs" is also added in this repo in order to match the setup of encoder-decoder OWSM. It can have better performance than the old one in many test sets.
|
18 |
|
19 |
Currently, the code for OWSM-CTC has not been merged into ESPnet main branch. Instead, it is available as follows:
|
20 |
- Code in my repo: https://github.com/pyf98/espnet/tree/owsm-ctc
|
21 |
- Current model on HF: https://huggingface.co/pyf98/owsm_ctc_v3.1_1B
|
22 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
23 |
### Example script for short-form ASR/ST
|
24 |
|
25 |
```python
|
|
|
11 |
license: cc-by-4.0
|
12 |
---
|
13 |
|
14 |
+
[OWSM-CTC](https://arxiv.org/abs/2402.12654) is an encoder-only speech foundation model based on hierarchical multi-task self-conditioned CTC.
|
15 |
It is trained on 180k hours of public audio data for multilingual speech recognition, any-to-any speech translation, and language identification, which follows the design of the previous [encoder-decoder OWSM](https://arxiv.org/abs/2401.16658).
|
16 |
|
17 |
+
Due to time constraint, the model used in the paper was trained for 40 "epochs". The new model trained for 45 "epochs" (approximately three entire passes on the full data) is also added in this repo in order to match the setup of encoder-decoder OWSM. It can have better performance than the old one in many test sets.
|
18 |
|
19 |
Currently, the code for OWSM-CTC has not been merged into ESPnet main branch. Instead, it is available as follows:
|
20 |
- Code in my repo: https://github.com/pyf98/espnet/tree/owsm-ctc
|
21 |
- Current model on HF: https://huggingface.co/pyf98/owsm_ctc_v3.1_1B
|
22 |
|
23 |
+
To use the pre-trained model, you need to install `espnet` and `espnet_model_zoo`. The requirements are:
|
24 |
+
```
|
25 |
+
librosa
|
26 |
+
torch
|
27 |
+
espnet @ git+https://github.com/pyf98/espnet@owsm-ctc
|
28 |
+
espnet_model_zoo
|
29 |
+
```
|
30 |
+
|
31 |
+
We use FlashAttention during training, but we do not need it during inference. Please install it as follows:
|
32 |
+
```bash
|
33 |
+
pip install flash-attn --no-build-isolation
|
34 |
+
```
|
35 |
+
|
36 |
### Example script for short-form ASR/ST
|
37 |
|
38 |
```python
|