Update README.md
Browse files
README.md
CHANGED
@@ -245,6 +245,8 @@ tags:
|
|
245 |
|
246 |
ZeroSwot is a state-of-the-art zero-shot end-to-end Speech Translation system.
|
247 |
|
|
|
|
|
248 |
The model is created by adapting a wav2vec2.0-based encoder to the embedding space of NLLB, using a novel subword compression module and Optimal Transport, while only utilizing ASR data. It thus enables **Zero-shot E2E Speech Translation to all the 200 languages supported by NLLB**.
|
249 |
|
250 |
For more details please refer to our [paper](https://arxiv.org/abs/2402.10422) and the [original repo](https://github.com/mt-upc/ZeroSwot) build on fairseq.
|
@@ -253,7 +255,7 @@ For more details please refer to our [paper](https://arxiv.org/abs/2402.10422) a
|
|
253 |
|
254 |
The compression module is a light-weight transformer that takes as input the hidden state of wav2vec2.0 and the corresponding CTC predictions, and compresses them to subword-like embeddings similar to those expected from NLLB and aligns them using Optimal Transport. For inference we simply pass the output of the speech encoder to NLLB encoder.
|
255 |
|
256 |
-
<div align=center><img src="resources/methodology.png" height="
|
257 |
|
258 |
## Version
|
259 |
|
|
|
245 |
|
246 |
ZeroSwot is a state-of-the-art zero-shot end-to-end Speech Translation system.
|
247 |
|
248 |
+
<div align=center><img src="resources/intro.png" height="75%" width="75%"/></div>
|
249 |
+
|
250 |
The model is created by adapting a wav2vec2.0-based encoder to the embedding space of NLLB, using a novel subword compression module and Optimal Transport, while only utilizing ASR data. It thus enables **Zero-shot E2E Speech Translation to all the 200 languages supported by NLLB**.
|
251 |
|
252 |
For more details please refer to our [paper](https://arxiv.org/abs/2402.10422) and the [original repo](https://github.com/mt-upc/ZeroSwot) build on fairseq.
|
|
|
255 |
|
256 |
The compression module is a light-weight transformer that takes as input the hidden state of wav2vec2.0 and the corresponding CTC predictions, and compresses them to subword-like embeddings similar to those expected from NLLB and aligns them using Optimal Transport. For inference we simply pass the output of the speech encoder to NLLB encoder.
|
257 |
|
258 |
+
<div align=center><img src="resources/methodology.png" height="120%" width="120%"/></div>
|
259 |
|
260 |
## Version
|
261 |
|