Vaibhav Srivastav commited on
Commit
fdcad28
1 Parent(s): 28870d8
Files changed (1) hide show
  1. README.md +38 -0
README.md ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ inference: false
3
+ tags:
4
+ - SeamlessM4T
5
+ license: cc-by-nc-4.0
6
+ ---
7
+
8
+ Apart from SeamlessM4T-LARGE (2.3B) and SeamlessM4T-MEDIUM (1.2B) models, we are also developing a small model (281M) targeting for on-device inference.
9
+ This folder contains an example to run an exported small model covering most tasks (ASR/S2TT/S2ST). The model could be executed on popular mobile devices with Pytorch Mobile (https://pytorch.org/mobile/home/).
10
+
11
+ ## Overview
12
+
13
+ | Model | Disk Size | Supported Tasks | Supported Languages|
14
+ |---------|----------------------|-------------------------|-------------------------|
15
+ | [UnitY-Small]() | 862MB | S2ST, S2TT, ASR |eng, fra, hin, por, spa|
16
+ | [UnitY-Small-S2T]() | 637MB | S2TT, ASR |eng, fra, hin, por, spa|
17
+
18
+ UnitY-Small-S2T is a pruned version of UnitY-Small without 2nd pass unit decoding.
19
+
20
+ ## Inference
21
+ To use exported model, users don't need seamless_communication or fairseq2 dependency.
22
+
23
+ ```python
24
+ import torchaudio
25
+ import torch
26
+ audio_input, _ = torchaudio.load(TEST_AUDIO_PATH) # Load waveform using torchaudio
27
+
28
+ s2t_model = torch.jit.load("unity_on_device_s2t.ptl") # Load exported S2T model
29
+ text = s2t_model(audio_input, tgt_lang=TGT_LANG) # Forward call with tgt_lang specified for ASR or S2TT
30
+ print(f"{lang}:{text}")
31
+
32
+ s2st_model = torch.jit.load("unity_on_device.ptl")
33
+ text, units, waveform = s2st_model(audio_input, tgt_lang=TGT_LANG) # S2ST model also returns waveform
34
+ print(f"{lang}:{text}")
35
+ torchaudio.save(f"{OUTPUT_FOLDER}/{lang}.wav", waveform.unsqueeze(0), sample_rate=16000) # Save output waveform to local file
36
+ ```
37
+
38
+ Also running the exported model doesn't need python runtime. For example, you could load this model in C++ following [this tutorial](https://pytorch.org/tutorials/advanced/cpp_export.html), or building your own on-device applications similar to [this example](https://github.com/pytorch/ios-demo-app/tree/master/SpeechRecognition)