ylacombe commited on
Commit
b61a560
·
1 Parent(s): a176fa4

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +114 -0
README.md ADDED
@@ -0,0 +1,114 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ tags:
4
+ - SeamlessM4T
5
+ - seamless_m4t
6
+ license: cc-by-nc-4.0
7
+ library_name: transformers
8
+ ---
9
+
10
+ # SeamlessM4T Medium
11
+
12
+ SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different
13
+ linguistic communities to communicate effortlessly through speech and text.
14
+
15
+ SeamlessM4T covers:
16
+ - 📥 101 languages for speech input
17
+ - ⌨️ 96 Languages for text input/output
18
+ - 🗣️ 35 languages for speech output.
19
+
20
+ This is the "medium" variant of the unified model, which enables multiple tasks without relying on multiple separate models:
21
+ - Speech-to-speech translation (S2ST)
22
+ - Speech-to-text translation (S2TT)
23
+ - Text-to-speech translation (T2ST)
24
+ - Text-to-text translation (T2TT)
25
+ - Automatic speech recognition (ASR)
26
+
27
+ You can perform all the above tasks from one single model - `SeamlessM4TModel`, but each task also has its own dedicated sub-model.
28
+
29
+
30
+
31
+ ## Usage
32
+
33
+ First, load the processor and a checkpoint of the model:
34
+
35
+ ```python
36
+ >>> from transformers import AutoProcessor, SeamlessM4TModel
37
+
38
+ >>> processor = AutoProcessor.from_pretrained("ylacombe/hf-seamless-m4t-medium")
39
+ >>> model = SeamlessM4TModel.from_pretrained("ylacombe/hf-seamless-m4t-medium")
40
+ ```
41
+
42
+ You can seamlessly use this model on text or on audio, to generated either translated text or translated audio.
43
+
44
+ ### Speech
45
+
46
+ You can easily generate translated speech with [`SeamlessM4TModel.generate`]. Here is an example showing how to generate speech from English to Russian.
47
+
48
+ ```python
49
+ >>> inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
50
+
51
+ >>> audio_array = model.generate(**inputs, tgt_lang="rus")
52
+ >>> audio_array = audio_array[0].cpu().numpy().squeeze()
53
+ ```
54
+
55
+ You can also translate directly from a speech waveform. Here is an example from Arabic to English:
56
+
57
+ ```python
58
+ >>> from datasets import load_dataset
59
+
60
+ >>> dataset = load_dataset("arabic_speech_corpus", split="test[0:1]")
61
+
62
+ >>> audio_sample = dataset["audio"][0]["array"]
63
+
64
+ >>> inputs = processor(audios = audio_sample, return_tensors="pt")
65
+
66
+ >>> audio_array = model.generate(**inputs, tgt_lang="rus")
67
+ >>> audio_array = audio_array[0].cpu().numpy().squeeze()
68
+ ```
69
+
70
+ #### Tips
71
+
72
+ [`SeamlessM4TModel`] is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
73
+ For example, you can replace the previous snippet with the model dedicated to the S2ST task:
74
+
75
+ ```python
76
+ >>> from transformers import SeamlessM4TForSpeechToSpeech
77
+ >>> model = SeamlessM4TForSpeechToSpeech.from_pretrained("ylacombe/hf-seamless-m4t-medium")
78
+ ```
79
+
80
+
81
+ ### Text
82
+
83
+ Similarly, you can generate translated text from text or audio files, this time using the dedicated models.
84
+
85
+ ```python
86
+ >>> from transformers import SeamlessM4TForSpeechToText
87
+ >>> model = SeamlessM4TForSpeechToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
88
+ >>> audio_sample = dataset["audio"][0]["array"]
89
+
90
+ >>> inputs = processor(audios = audio_sample, return_tensors="pt")
91
+
92
+ >>> output_tokens = model.generate(**inputs, tgt_lang="fra")
93
+ >>> translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
94
+ ```
95
+
96
+ And from text:
97
+
98
+ ```python
99
+ >>> from transformers import SeamlessM4TForTextToText
100
+ >>> model = SeamlessM4TForTextToText.from_pretrained("ylacombe/hf-seamless-m4t-medium")
101
+ >>> inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
102
+
103
+ >>> output_tokens = model.generate(**inputs, tgt_lang="fra")
104
+ >>> translated_text = processor.decode(output_tokens.tolist()[0], skip_special_tokens=True)
105
+ ```
106
+
107
+ #### Tips
108
+
109
+ Three last tips:
110
+
111
+ 1. [`SeamlessM4TModel`] can generate text and/or speech. Pass `generate_speech=False` to [`SeamlessM4TModel.generate`] to only generate text. You also have the possibility to pass `return_intermediate_token_ids=True`, to get both text token ids and the generated speech.
112
+ 2. You have the possibility to change the speaker used for speech synthesis with the `spkr_id` argument.
113
+ 3. You can use different [generation strategies](./generation_strategies) for speech and text generation, e.g `.generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True)` which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
114
+