File size: 2,459 Bytes
62e9ca6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
# Speech2S
<!--**Pre-trained models for speech related tasks**-->

 [**Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation**](https://arxiv.org/abs/2210.17027)


- (Updating) Nov. 2022: release the code and models
- Nov. 2022: release preprint in [arXiv](https://arxiv.org/abs/2210.17027)

## Pre-Trained and Fine-tuned Models

|  Model   |               Pre-training Dataset               | Fine-tuning Dataset | Model |
| :------: | :----------------------------------------------: | :-----------------: | :-----: |
| Speech2S_enes |   Voxpopuli_en_v2 |         -          | [Google Drive](https://drive.google.com/file/d/1TYypFiEKoCixUro8FTTG23bRZYwAxhkX/view?usp=share_link)  |
| Speech2S_enes |   Voxpopuli_en_v2 | Voxpopuli_s2s |  [Google Drive](https://drive.google.com/file/d/11RxeKznSrHcoP_KK9A1VgwRt3fNh_U_C/view?usp=share_link) |
| Speech2S_esen |   Voxpopuli_es_v2 |         -          | [Google Drive](https://drive.google.com/file/d/1NoC7W-UtQZ-ugIptF1ex0ZlGJncsT1S4/view?usp=share_link) |
| Speech2S_esen |   Voxpopuli_es_v2 | Voxpopuli_s2s |  [Google Drive](https://drive.google.com/file/d/1eNcKw4ZWGmcABWXJxlf6MKocmiPrKSkH/view?usp=share_link) |


## Setup
```
cd Speech2S/speech2s
pip install --editable fairseq/
```

## Data Preparation
Please follow the steps of data preparation for S2ST in [here](https://github.com/facebookresearch/fairseq/blob/main/examples/speech_to_speech/docs/enhanced_direct_s2st_discrete_units.md).

## Pre-Training
```
cd speech2s/stpretrain_scripts
base_sc2c_enes.sh
```
## Finetune
```
cd speech2s/stpretrain_scripts
finetune_enes.sh
```
## Inference
```
cd speech2s/stpretrain_scripts
inference_ed.sh
```
## Results on Voxpopuli and Covst


## License

This project is licensed under the license found in the LICENSE file in the root directory of this source tree.
Portions of the source code are based on the [FAIRSEQ](https://github.com/pytorch/fairseq).

[Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct)

## Reference

If you find our work is useful in your research, please cite the following paper: 
```bibtex
@article{wei2022joint,
  title={Joint Pre-Training with Speech and Bilingual Text for Direct Speech to Speech Translation},
  author={Wei, Kun and Zhou, Long and Zhang, Ziqiang and Chen, Liping and Liu, Shujie and He, Lei and Li, Jinyu and Wei, Furu},
  journal={arXiv preprint arXiv:2210.17027},
  year={2022}
}
```