Spaces:
Running
Running
Add examples folder
Browse files- README.md +15 -4
- examples/Alpaca-Lora.sh +14 -0
- examples/FlanT5-large.sh +10 -0
- examples/LLaMA.sh +13 -0
- examples/README.md +36 -0
- examples/Vicuna.sh +16 -0
- examples/m2m100-1.2B.sh +10 -0
- examples/m2m100-1.2B_2GPUS.sh +10 -0
- examples/m2m100-12B_8bit.sh +15 -0
- examples/m2m100-12B_fp16.sh +12 -0
- examples/mbart.sh +10 -0
- examples/nllb200-moe-54B_1GPU_4bits.sh +16 -0
- examples/nllb200-moe-54B_1GPU_8bits.sh +16 -0
- examples/nllb200_3B_8bit.sh +12 -0
- examples/nllb200_3B_fp16.sh +12 -0
- examples/opusMT.sh +8 -0
- examples/small100.sh +10 -0
README.md
CHANGED
@@ -44,8 +44,9 @@ See the [Supported languages table](supported_languages.md) for a table of the s
|
|
44 |
|
45 |
## Supported Models
|
46 |
|
47 |
-
💥 EasyTranslate now supports any Seq2SeqLM (m2m100, nllb200, MarianMT, T5, FlanT5, etc.) and any CausalLM (GPT2, LLaMA, Vicuna, Falcon) model from HuggingFace's Hub!!
|
48 |
We still recommend you to use M2M100 or NLLB200 for the best results, but you can experiment with other LLMs and prompting to generate translations. See [Prompting Section](#prompting) for more information.
|
|
|
49 |
|
50 |
### M2M100
|
51 |
**M2M100** is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
|
@@ -72,6 +73,13 @@ We still recommend you to use M2M100 or NLLB200 for the best results, but you ca
|
|
72 |
|
73 |
- **facebook/nllb-200-distilled-600M**: <https://huggingface.co/facebook/nllb-200-distilled-600M>
|
74 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
75 |
|
76 |
## Citation
|
77 |
If you use this software please cite
|
@@ -113,7 +121,8 @@ pip install peft
|
|
113 |
|
114 |
## Translate a file
|
115 |
|
116 |
-
Run `python translate.py -h` for more info.
|
|
|
117 |
|
118 |
#### Using a single CPU / GPU
|
119 |
|
@@ -156,11 +165,13 @@ pip install bitsandbytes
|
|
156 |
|
157 |
python3 translate.py \
|
158 |
--sentences_path sample_text/en.txt \
|
159 |
-
--output_path sample_text/en2es.translation.
|
160 |
--source_lang eng_Latn \
|
161 |
--target_lang spa_Latn \
|
162 |
--model_name facebook/nllb-moe-54b \
|
163 |
-
--precision 8
|
|
|
|
|
164 |
```
|
165 |
|
166 |
If even the quantified model does not fit in your GPU memory, you can set the `--force_auto_device_map` flag.
|
|
|
44 |
|
45 |
## Supported Models
|
46 |
|
47 |
+
💥 EasyTranslate now supports any Seq2SeqLM (m2m100, nllb200, small100, mbart, MarianMT, T5, FlanT5, etc.) and any CausalLM (GPT2, LLaMA, Vicuna, Falcon) model from HuggingFace's Hub!!
|
48 |
We still recommend you to use M2M100 or NLLB200 for the best results, but you can experiment with other LLMs and prompting to generate translations. See [Prompting Section](#prompting) for more information.
|
49 |
+
You can also see [the examples folder](examples) for examples of how to use EasyTranslate with different models.
|
50 |
|
51 |
### M2M100
|
52 |
**M2M100** is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
|
|
|
73 |
|
74 |
- **facebook/nllb-200-distilled-600M**: <https://huggingface.co/facebook/nllb-200-distilled-600M>
|
75 |
|
76 |
+
### Other MT Models supported
|
77 |
+
We support every MT model in the 🤗 Hugging Face's HUB. If you find one that doesn't work, please open an issue for us to fix it or a PR with the fix. This includes, amoung meny others:
|
78 |
+
- **Small100**: <https://huggingface.co/alirezamsh/small100>
|
79 |
+
- **Mbart many-to-many / many-to-one**: <https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt>
|
80 |
+
- **Opus MT**: <Helsinki-NLP/opus-mt-es-en>
|
81 |
+
|
82 |
+
|
83 |
|
84 |
## Citation
|
85 |
If you use this software please cite
|
|
|
121 |
|
122 |
## Translate a file
|
123 |
|
124 |
+
Run `python translate.py -h` for more info.
|
125 |
+
See [the examples folder](examples) for examples of how to run different models.
|
126 |
|
127 |
#### Using a single CPU / GPU
|
128 |
|
|
|
165 |
|
166 |
python3 translate.py \
|
167 |
--sentences_path sample_text/en.txt \
|
168 |
+
--output_path sample_text/en2es.translation.nllb200-moe-54B.txt \
|
169 |
--source_lang eng_Latn \
|
170 |
--target_lang spa_Latn \
|
171 |
--model_name facebook/nllb-moe-54b \
|
172 |
+
--precision 8 \
|
173 |
+
--force_auto_device_map \
|
174 |
+
--starting_batch_size 8
|
175 |
```
|
176 |
|
177 |
If even the quantified model does not fit in your GPU memory, you can set the `--force_auto_device_map` flag.
|
examples/Alpaca-Lora.sh
ADDED
@@ -0,0 +1,14 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run Alpaca-Lora (A LoRA model) model on sample text using prompting
|
2 |
+
# We need to set the base model with --model_name and the LoRA weights with --lora_weights_name_or_path
|
3 |
+
|
4 |
+
cd ..
|
5 |
+
|
6 |
+
python3 translate.py \
|
7 |
+
--sentences_path sample_text/en.txt \
|
8 |
+
--output_path sample_text/en2es.AlpacaLora.translation.txt \
|
9 |
+
--model_name decapoda-research/llama-7b-hf \
|
10 |
+
--lora_weights_name_or_path tloen/alpaca-lora-7b \
|
11 |
+
--prompt "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nTranslate this text from English into Spanish\n\n### Input:\n%%SENTENCE%%\n\n### Response:\n" \
|
12 |
+
--precision 8 \
|
13 |
+
--force_auto_device_map \
|
14 |
+
--starting_batch_size 8
|
examples/FlanT5-large.sh
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run Flan-T5 model on sample text using promting
|
2 |
+
|
3 |
+
cd ..
|
4 |
+
|
5 |
+
python3 translate.py \
|
6 |
+
--sentences_path sample_text/en.txt \
|
7 |
+
--output_path sample_text/en2es.FlanT5.translation.txt \
|
8 |
+
--model_name google/flan-t5-large \
|
9 |
+
--prompt "Translate English to Spanish: %%SENTENCE%%" \
|
10 |
+
--precision bf16
|
examples/LLaMA.sh
ADDED
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run LLaMA65B model on sample text using prompting
|
2 |
+
|
3 |
+
cd ..
|
4 |
+
|
5 |
+
python3 translate.py \
|
6 |
+
--sentences_path sample_text/en.txt \
|
7 |
+
--output_path sample_text/en2es.LLaMA.translation.txt \
|
8 |
+
--model_name PATH_TO_LOCAL_LLAMA_WEIGHTS_IN_HF_FORMAT \
|
9 |
+
--prompt "Translate English to Spanish: %%SENTENCE%%" \
|
10 |
+
--precision 8 \
|
11 |
+
--precision 4 \
|
12 |
+
--force_auto_device_map \
|
13 |
+
--starting_batch_size 8
|
examples/README.md
ADDED
@@ -0,0 +1,36 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Easy-Translate Examples
|
2 |
+
|
3 |
+
This folder contains examples of how to use Easy-Translate with different models and configurations.
|
4 |
+
You can adapt these examples to your own use cases. If you have any questions, please feel free to open an issue.
|
5 |
+
|
6 |
+
### MT Models
|
7 |
+
|
8 |
+
```bash
|
9 |
+
m2m100-1.2B.sh
|
10 |
+
m2m100-12B_fp16.sh
|
11 |
+
nllb200_3B_fp16.sh
|
12 |
+
opusMT.sh
|
13 |
+
mbart.sh
|
14 |
+
small100.sh
|
15 |
+
```
|
16 |
+
|
17 |
+
#### Multi-GPU example
|
18 |
+
```bash
|
19 |
+
m2m100-1.2B_2GPUs.sh
|
20 |
+
```
|
21 |
+
#### Running large models on customer hardware
|
22 |
+
```bash
|
23 |
+
m2m100-12B_8bits.sh
|
24 |
+
nllb200_3B_8bit.sh
|
25 |
+
nllb200-moe-54B_1GPU_int8.sh
|
26 |
+
nllb200-moe-54B_1GPU_int4.sh
|
27 |
+
```
|
28 |
+
|
29 |
+
### Running LLMs with translation prompts
|
30 |
+
|
31 |
+
```bash
|
32 |
+
FlanT5-large.sh
|
33 |
+
LLaMA.sh
|
34 |
+
Vicuna.sh
|
35 |
+
Alpaca-Lora.sh
|
36 |
+
```
|
examples/Vicuna.sh
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run Vicuna1.3 model on sample text using prompting
|
2 |
+
# Different model sizes available, see https://github.com/lm-sys/FastChat#vicuna-weights:
|
3 |
+
# lmsys/vicuna-33b-v1.3
|
4 |
+
# lmsys/vicuna-13b-v1.3
|
5 |
+
# lmsys/vicuna-7b-v1.3
|
6 |
+
|
7 |
+
cd ..
|
8 |
+
|
9 |
+
python3 translate.py \
|
10 |
+
--sentences_path sample_text/en.txt \
|
11 |
+
--output_path sample_text/en2es.Vicuna33B.translation.txt \
|
12 |
+
--model_name lmsys/vicuna-33b-v1.3 \
|
13 |
+
--prompt "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: %%SENTENCE%% ASSISTANT:" \
|
14 |
+
--precision 8 \
|
15 |
+
--force_auto_device_map \
|
16 |
+
--starting_batch_size 8
|
examples/m2m100-1.2B.sh
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run M2M100-1.2B model on sample text. One GPU, default precision.
|
2 |
+
|
3 |
+
cd ..
|
4 |
+
|
5 |
+
python3 translate.py \
|
6 |
+
--sentences_path sample_text/en.txt \
|
7 |
+
--output_path sample_text/en2es.translation.m2m100_1.2B.txt \
|
8 |
+
--source_lang en \
|
9 |
+
--target_lang es \
|
10 |
+
--model_name facebook/m2m100_1.2B
|
examples/m2m100-1.2B_2GPUS.sh
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run M2M100-1.2B model on sample text. Multi GPU, default precision.
|
2 |
+
|
3 |
+
cd ..
|
4 |
+
|
5 |
+
accelerate launch --multi_gpu --num_processes 2 --num_machines 1 translate.py \
|
6 |
+
--sentences_path sample_text/en.txt \
|
7 |
+
--output_path sample_text/en2es.translation.m2m100_1.2B.txt \
|
8 |
+
--source_lang en \
|
9 |
+
--target_lang es \
|
10 |
+
--model_name facebook/m2m100_1.2B
|
examples/m2m100-12B_8bit.sh
ADDED
@@ -0,0 +1,15 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run M2M100-1.2B model on sample text. This model requires a GPU with a lot of VRAM, so we use
|
2 |
+
# 8-bit quantization to reduce the required VRAM so we can fit in customer grade GPUs. If you have a GPU
|
3 |
+
# with a lot of RAM, running the model in FP16 should be faster and produce sighly better results,
|
4 |
+
# see examples/m2m100-12B_fp16.sh
|
5 |
+
|
6 |
+
cd ..
|
7 |
+
|
8 |
+
python3 translate.py \
|
9 |
+
--sentences_path sample_text/en.txt \
|
10 |
+
--output_path sample_text/en2es.translation.m2m100_12B.txt \
|
11 |
+
--source_lang en \
|
12 |
+
--target_lang es \
|
13 |
+
--model_name facebook/m2m100-12B-avg-5-ckpt \
|
14 |
+
--precision 8 \
|
15 |
+
--starting_batch_size 8
|
examples/m2m100-12B_fp16.sh
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run M2M100-1.2B model on sample text. We use FP16 precision, which requires a GPU with a lot of VRAM (i.e NVIDIA A100)
|
2 |
+
# For running this model in customer grade GPUs, use 8-bit quantization, see examples/m2m100-12B_8bit.sh
|
3 |
+
|
4 |
+
cd ..
|
5 |
+
|
6 |
+
python3 translate.py \
|
7 |
+
--sentences_path sample_text/en.txt \
|
8 |
+
--output_path sample_text/en2es.translation.m2m100_12B.txt \
|
9 |
+
--source_lang en \
|
10 |
+
--target_lang es \
|
11 |
+
--model_name facebook/m2m100-12B-avg-5-ckpt \
|
12 |
+
--precision fp16
|
examples/mbart.sh
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run Mbart-many-to-many model on sample text.
|
2 |
+
|
3 |
+
cd ..
|
4 |
+
|
5 |
+
python3 translate.py \
|
6 |
+
--sentences_path sample_text/en.txt \
|
7 |
+
--output_path sample_text/en2es.translation.mbart.txt \
|
8 |
+
--source_lang en_XX \
|
9 |
+
--target_lang es_XX \
|
10 |
+
--model_name facebook/mbart-large-50-many-to-many-mmt
|
examples/nllb200-moe-54B_1GPU_4bits.sh
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run NLLB200-MOE model on sample text. This is a huge model that doesn't fit on a single GPU, so we use
|
2 |
+
# 4-bit quantization to reduce the required VRAM. Still it might not fit on a single GPU, so we also use
|
3 |
+
# the --force_auto_device_map flag that will offload the model parameters that don't fit on the GPU to the CPU.
|
4 |
+
|
5 |
+
|
6 |
+
cd ..
|
7 |
+
|
8 |
+
python3 translate.py \
|
9 |
+
--sentences_path sample_text/en.txt \
|
10 |
+
--output_path sample_text/en2es.translation.nllb200-moe-54B.txt \
|
11 |
+
--source_lang eng_Latn \
|
12 |
+
--target_lang spa_Latn \
|
13 |
+
--model_name facebook/nllb-moe-54b \
|
14 |
+
--precision 4 \
|
15 |
+
--force_auto_device_map \
|
16 |
+
--starting_batch_size 8
|
examples/nllb200-moe-54B_1GPU_8bits.sh
ADDED
@@ -0,0 +1,16 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run NLLB200-MOE model on sample text. This is a huge model that doesn't fit on a single GPU, so we use
|
2 |
+
# 8-bit quantization to reduce the required VRAM. Still it might not fit on a single GPU, so we also use
|
3 |
+
# the --force_auto_device_map flag that will offload the model parameters that don't fit on the GPU to the CPU.
|
4 |
+
# If 8-bit quantization is not enough, you can use 4-bit quantization, see examples/nllb200-moe-54B_1GPU_4bits.sh
|
5 |
+
|
6 |
+
cd ..
|
7 |
+
|
8 |
+
python3 translate.py \
|
9 |
+
--sentences_path sample_text/en.txt \
|
10 |
+
--output_path sample_text/en2es.translation.nllb200-moe-54B.txt \
|
11 |
+
--source_lang eng_Latn \
|
12 |
+
--target_lang spa_Latn \
|
13 |
+
--model_name facebook/nllb-moe-54b \
|
14 |
+
--precision 8 \
|
15 |
+
--force_auto_device_map \
|
16 |
+
--starting_batch_size 8
|
examples/nllb200_3B_8bit.sh
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run NLLB200-3B model on sample text. We use FP16 precision, which requires a GPU with a lot of VRAM
|
2 |
+
# For running this model in GPUs with less VRAM, use 8-bit quantization, see examples/nllb200_3B_8bit.sh
|
3 |
+
|
4 |
+
cd ..
|
5 |
+
|
6 |
+
python3 translate.py \
|
7 |
+
--sentences_path sample_text/en.txt \
|
8 |
+
--output_path sample_text/en2es.translation.nllb-200_3B.txt \
|
9 |
+
--source_lang eng_Latn \
|
10 |
+
--target_lang spa_Latn \
|
11 |
+
--model_name facebook/nllb-200-3.3B \
|
12 |
+
--precision 8
|
examples/nllb200_3B_fp16.sh
ADDED
@@ -0,0 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run NLLB200-3B model on sample text. We use FP16 precision, which requires a GPU with a lot of VRAM
|
2 |
+
# For running this model in GPUs with less VRAM, use 8-bit quantization, see examples/nllb200_3B_8bit.sh
|
3 |
+
|
4 |
+
cd ..
|
5 |
+
|
6 |
+
python3 translate.py \
|
7 |
+
--sentences_path sample_text/en.txt \
|
8 |
+
--output_path sample_text/en2es.translation.nllb-200_3B.txt \
|
9 |
+
--source_lang eng_Latn \
|
10 |
+
--target_lang spa_Latn \
|
11 |
+
--model_name facebook/nllb-200-3.3B \
|
12 |
+
--precision fp16
|
examples/opusMT.sh
ADDED
@@ -0,0 +1,8 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run OpusMT model on sample text.
|
2 |
+
|
3 |
+
cd ..
|
4 |
+
|
5 |
+
python3 translate.py \
|
6 |
+
--sentences_path sample_text/en.txt \
|
7 |
+
--output_path sample_text/en2es.translation.opus.txt \
|
8 |
+
--model_name Helsinki-NLP/opus-mt-es-en
|
examples/small100.sh
ADDED
@@ -0,0 +1,10 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
# Run SMALL100 model on sample text.
|
2 |
+
|
3 |
+
cd ..
|
4 |
+
|
5 |
+
python3 translate.py \
|
6 |
+
--sentences_path sample_text/en.txt \
|
7 |
+
--output_path sample_text/en2es.translation.small100.txt \
|
8 |
+
--source_lang en \
|
9 |
+
--target_lang es \
|
10 |
+
--model_name alirezamsh/small100
|