facebook
/

hf-seamless-m4t-medium

@@ -12,7 +12,7 @@ library_name: transformers
 SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different
 linguistic communities to communicate effortlessly through speech and text.
-This repository hosts 🤗 Hugging Face's [implementation](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t) of SeamlessM4T. You can find the original weights, as well as a guide on how to run them in the original hub repositories ([large](https://huggingface.co/facebook/seamless-m4t-large) and [medium](https://huggingface.co/facebook/seamless-m4t-medium) checkpoints).
 SeamlessM4T Medium covers:
 - 📥 101 languages for speech input
@@ -26,7 +26,7 @@ This is the "medium" variant of the unified model, which enables multiple tasks
 - Text-to-text translation (T2TT)
 - Automatic speech recognition (ASR)
-You can perform all the above tasks from one single model, [`SeamlessM4TModel`](https://moon-ci-docs.huggingface.co/docs/transformers/pr_25693/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel), but each task also has its own dedicated sub-model.
 ## 🤗 Usage
@@ -60,7 +60,7 @@ Here is how to use the processor to process text and audio:
 ### Speech
-[`SeamlessM4TModel`] can *seamlessly* generate text or speech with few or no changes. Let's target Russian voice translation:
 ```python
 >>> audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
@@ -71,7 +71,7 @@ With basically the same code, I've translated English text and Arabic speech to
 ### Text
-Similarly, you can generate translated text from audio files or from text with the same model. You only have to pass `generate_speech=False` to [`SeamlessM4TModel.generate`].
 This time, let's translate to French.
 ```python
@@ -89,7 +89,7 @@ This time, let's translate to French.
 #### 1. Use dedicated models
-[`SeamlessM4TModel`] is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
 For example, you can replace the audio-to-audio generation snippet with the model dedicated to the S2ST task, the rest is exactly the same code:
 ```python
@@ -104,16 +104,16 @@ Or you can replace the text-to-text generation snippet with the model dedicated
 >>> model = SeamlessM4TForTextToText.from_pretrained("facebook/hf-seamless-m4t-medium")
 ```
-Feel free to try out [`SeamlessM4TForSpeechToText`] and [`SeamlessM4TForTextToSpeech`] as well.
 #### 2. Change the speaker identity
 You have the possibility to change the speaker used for speech synthesis with the `spkr_id` argument. Some `spkr_id` works better than other for some languages!
-#### 3. Change the speaker identity
-You can use different [generation strategies](./generation_strategies) for speech and text generation, e.g `.generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True)` which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
 #### 4. Generate speech and text at the same time
-Use `return_intermediate_token_ids=True` with [`SeamlessM4TModel`] to return both speech and text !

 SeamlessM4T is a collection of models designed to provide high quality translation, allowing people from different
 linguistic communities to communicate effortlessly through speech and text.
+This repository hosts 🤗 Hugging Face's [implementation](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t) of SeamlessM4T. You can find the original weights, as well as a guide on how to run them in the original hub repositories ([large](https://huggingface.co/facebook/seamless-m4t-large) and [medium](https://huggingface.co/facebook/seamless-m4t-medium) checkpoints).
 SeamlessM4T Medium covers:
 - 📥 101 languages for speech input
 - Text-to-text translation (T2TT)
 - Automatic speech recognition (ASR)
+You can perform all the above tasks from one single model, [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel), but each task also has its own dedicated sub-model.
 ## 🤗 Usage
 ### Speech
+[`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) can *seamlessly* generate text or speech with few or no changes. Let's target Russian voice translation:
 ```python
 >>> audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
 ### Text
+Similarly, you can generate translated text from audio files or from text with the same model. You only have to pass `generate_speech=False` to [`SeamlessM4TModel.generate`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel.generate).
 This time, let's translate to French.
 ```python
 #### 1. Use dedicated models
+[`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) is transformers top level model to generate speech and text, but you can also use dedicated models that perform the task without additional components, thus reducing the memory footprint.
 For example, you can replace the audio-to-audio generation snippet with the model dedicated to the S2ST task, the rest is exactly the same code:
 ```python
 >>> model = SeamlessM4TForTextToText.from_pretrained("facebook/hf-seamless-m4t-medium")
 ```
+Feel free to try out [`SeamlessM4TForSpeechToText`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TForSpeechToText) and [`SeamlessM4TForTextToSpeech`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TForTextToSpeech) as well.
 #### 2. Change the speaker identity
 You have the possibility to change the speaker used for speech synthesis with the `spkr_id` argument. Some `spkr_id` works better than other for some languages!
+#### 3. Change the generation strategy
+You can use different [generation strategies](https://huggingface.co/docs/transformers/v4.34.1/en/generation_strategies#text-generation-strategies) for speech and text generation, e.g `.generate(input_ids=input_ids, text_num_beams=4, speech_do_sample=True)` which will successively perform beam-search decoding on the text model, and multinomial sampling on the speech model.
 #### 4. Generate speech and text at the same time
+Use `return_intermediate_token_ids=True` with [`SeamlessM4TModel`](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t#transformers.SeamlessM4TModel) to return both speech and text !