Iker commited on
Commit
e2b2525
·
1 Parent(s): 55e5d07

Add examples folder

Browse files
README.md CHANGED
@@ -44,8 +44,9 @@ See the [Supported languages table](supported_languages.md) for a table of the s
44
 
45
  ## Supported Models
46
 
47
- 💥 EasyTranslate now supports any Seq2SeqLM (m2m100, nllb200, MarianMT, T5, FlanT5, etc.) and any CausalLM (GPT2, LLaMA, Vicuna, Falcon) model from HuggingFace's Hub!!
48
  We still recommend you to use M2M100 or NLLB200 for the best results, but you can experiment with other LLMs and prompting to generate translations. See [Prompting Section](#prompting) for more information.
 
49
 
50
  ### M2M100
51
  **M2M100** is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
@@ -72,6 +73,13 @@ We still recommend you to use M2M100 or NLLB200 for the best results, but you ca
72
 
73
  - **facebook/nllb-200-distilled-600M**: <https://huggingface.co/facebook/nllb-200-distilled-600M>
74
 
 
 
 
 
 
 
 
75
 
76
  ## Citation
77
  If you use this software please cite
@@ -113,7 +121,8 @@ pip install peft
113
 
114
  ## Translate a file
115
 
116
- Run `python translate.py -h` for more info.
 
117
 
118
  #### Using a single CPU / GPU
119
 
@@ -156,11 +165,13 @@ pip install bitsandbytes
156
 
157
  python3 translate.py \
158
  --sentences_path sample_text/en.txt \
159
- --output_path sample_text/en2es.translation.nllb-moe-54b.txt \
160
  --source_lang eng_Latn \
161
  --target_lang spa_Latn \
162
  --model_name facebook/nllb-moe-54b \
163
- --precision 8
 
 
164
  ```
165
 
166
  If even the quantified model does not fit in your GPU memory, you can set the `--force_auto_device_map` flag.
 
44
 
45
  ## Supported Models
46
 
47
+ 💥 EasyTranslate now supports any Seq2SeqLM (m2m100, nllb200, small100, mbart, MarianMT, T5, FlanT5, etc.) and any CausalLM (GPT2, LLaMA, Vicuna, Falcon) model from HuggingFace's Hub!!
48
  We still recommend you to use M2M100 or NLLB200 for the best results, but you can experiment with other LLMs and prompting to generate translations. See [Prompting Section](#prompting) for more information.
49
+ You can also see [the examples folder](examples) for examples of how to use EasyTranslate with different models.
50
 
51
  ### M2M100
52
  **M2M100** is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
 
73
 
74
  - **facebook/nllb-200-distilled-600M**: <https://huggingface.co/facebook/nllb-200-distilled-600M>
75
 
76
+ ### Other MT Models supported
77
+ We support every MT model in the 🤗 Hugging Face's HUB. If you find one that doesn't work, please open an issue for us to fix it or a PR with the fix. This includes, amoung meny others:
78
+ - **Small100**: <https://huggingface.co/alirezamsh/small100>
79
+ - **Mbart many-to-many / many-to-one**: <https://huggingface.co/facebook/mbart-large-50-many-to-many-mmt>
80
+ - **Opus MT**: <Helsinki-NLP/opus-mt-es-en>
81
+
82
+
83
 
84
  ## Citation
85
  If you use this software please cite
 
121
 
122
  ## Translate a file
123
 
124
+ Run `python translate.py -h` for more info.
125
+ See [the examples folder](examples) for examples of how to run different models.
126
 
127
  #### Using a single CPU / GPU
128
 
 
165
 
166
  python3 translate.py \
167
  --sentences_path sample_text/en.txt \
168
+ --output_path sample_text/en2es.translation.nllb200-moe-54B.txt \
169
  --source_lang eng_Latn \
170
  --target_lang spa_Latn \
171
  --model_name facebook/nllb-moe-54b \
172
+ --precision 8 \
173
+ --force_auto_device_map \
174
+ --starting_batch_size 8
175
  ```
176
 
177
  If even the quantified model does not fit in your GPU memory, you can set the `--force_auto_device_map` flag.
examples/Alpaca-Lora.sh ADDED
@@ -0,0 +1,14 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run Alpaca-Lora (A LoRA model) model on sample text using prompting
2
+ # We need to set the base model with --model_name and the LoRA weights with --lora_weights_name_or_path
3
+
4
+ cd ..
5
+
6
+ python3 translate.py \
7
+ --sentences_path sample_text/en.txt \
8
+ --output_path sample_text/en2es.AlpacaLora.translation.txt \
9
+ --model_name decapoda-research/llama-7b-hf \
10
+ --lora_weights_name_or_path tloen/alpaca-lora-7b \
11
+ --prompt "Below is an instruction that describes a task, paired with an input that provides further context. Write a response that appropriately completes the request.\n\n### Instruction:\nTranslate this text from English into Spanish\n\n### Input:\n%%SENTENCE%%\n\n### Response:\n" \
12
+ --precision 8 \
13
+ --force_auto_device_map \
14
+ --starting_batch_size 8
examples/FlanT5-large.sh ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run Flan-T5 model on sample text using promting
2
+
3
+ cd ..
4
+
5
+ python3 translate.py \
6
+ --sentences_path sample_text/en.txt \
7
+ --output_path sample_text/en2es.FlanT5.translation.txt \
8
+ --model_name google/flan-t5-large \
9
+ --prompt "Translate English to Spanish: %%SENTENCE%%" \
10
+ --precision bf16
examples/LLaMA.sh ADDED
@@ -0,0 +1,13 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run LLaMA65B model on sample text using prompting
2
+
3
+ cd ..
4
+
5
+ python3 translate.py \
6
+ --sentences_path sample_text/en.txt \
7
+ --output_path sample_text/en2es.LLaMA.translation.txt \
8
+ --model_name PATH_TO_LOCAL_LLAMA_WEIGHTS_IN_HF_FORMAT \
9
+ --prompt "Translate English to Spanish: %%SENTENCE%%" \
10
+ --precision 8 \
11
+ --precision 4 \
12
+ --force_auto_device_map \
13
+ --starting_batch_size 8
examples/README.md ADDED
@@ -0,0 +1,36 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Easy-Translate Examples
2
+
3
+ This folder contains examples of how to use Easy-Translate with different models and configurations.
4
+ You can adapt these examples to your own use cases. If you have any questions, please feel free to open an issue.
5
+
6
+ ### MT Models
7
+
8
+ ```bash
9
+ m2m100-1.2B.sh
10
+ m2m100-12B_fp16.sh
11
+ nllb200_3B_fp16.sh
12
+ opusMT.sh
13
+ mbart.sh
14
+ small100.sh
15
+ ```
16
+
17
+ #### Multi-GPU example
18
+ ```bash
19
+ m2m100-1.2B_2GPUs.sh
20
+ ```
21
+ #### Running large models on customer hardware
22
+ ```bash
23
+ m2m100-12B_8bits.sh
24
+ nllb200_3B_8bit.sh
25
+ nllb200-moe-54B_1GPU_int8.sh
26
+ nllb200-moe-54B_1GPU_int4.sh
27
+ ```
28
+
29
+ ### Running LLMs with translation prompts
30
+
31
+ ```bash
32
+ FlanT5-large.sh
33
+ LLaMA.sh
34
+ Vicuna.sh
35
+ Alpaca-Lora.sh
36
+ ```
examples/Vicuna.sh ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run Vicuna1.3 model on sample text using prompting
2
+ # Different model sizes available, see https://github.com/lm-sys/FastChat#vicuna-weights:
3
+ # lmsys/vicuna-33b-v1.3
4
+ # lmsys/vicuna-13b-v1.3
5
+ # lmsys/vicuna-7b-v1.3
6
+
7
+ cd ..
8
+
9
+ python3 translate.py \
10
+ --sentences_path sample_text/en.txt \
11
+ --output_path sample_text/en2es.Vicuna33B.translation.txt \
12
+ --model_name lmsys/vicuna-33b-v1.3 \
13
+ --prompt "A chat between a curious user and an artificial intelligence assistant. The assistant gives helpful, detailed, and polite answers to the user's questions. USER: %%SENTENCE%% ASSISTANT:" \
14
+ --precision 8 \
15
+ --force_auto_device_map \
16
+ --starting_batch_size 8
examples/m2m100-1.2B.sh ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run M2M100-1.2B model on sample text. One GPU, default precision.
2
+
3
+ cd ..
4
+
5
+ python3 translate.py \
6
+ --sentences_path sample_text/en.txt \
7
+ --output_path sample_text/en2es.translation.m2m100_1.2B.txt \
8
+ --source_lang en \
9
+ --target_lang es \
10
+ --model_name facebook/m2m100_1.2B
examples/m2m100-1.2B_2GPUS.sh ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run M2M100-1.2B model on sample text. Multi GPU, default precision.
2
+
3
+ cd ..
4
+
5
+ accelerate launch --multi_gpu --num_processes 2 --num_machines 1 translate.py \
6
+ --sentences_path sample_text/en.txt \
7
+ --output_path sample_text/en2es.translation.m2m100_1.2B.txt \
8
+ --source_lang en \
9
+ --target_lang es \
10
+ --model_name facebook/m2m100_1.2B
examples/m2m100-12B_8bit.sh ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run M2M100-1.2B model on sample text. This model requires a GPU with a lot of VRAM, so we use
2
+ # 8-bit quantization to reduce the required VRAM so we can fit in customer grade GPUs. If you have a GPU
3
+ # with a lot of RAM, running the model in FP16 should be faster and produce sighly better results,
4
+ # see examples/m2m100-12B_fp16.sh
5
+
6
+ cd ..
7
+
8
+ python3 translate.py \
9
+ --sentences_path sample_text/en.txt \
10
+ --output_path sample_text/en2es.translation.m2m100_12B.txt \
11
+ --source_lang en \
12
+ --target_lang es \
13
+ --model_name facebook/m2m100-12B-avg-5-ckpt \
14
+ --precision 8 \
15
+ --starting_batch_size 8
examples/m2m100-12B_fp16.sh ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run M2M100-1.2B model on sample text. We use FP16 precision, which requires a GPU with a lot of VRAM (i.e NVIDIA A100)
2
+ # For running this model in customer grade GPUs, use 8-bit quantization, see examples/m2m100-12B_8bit.sh
3
+
4
+ cd ..
5
+
6
+ python3 translate.py \
7
+ --sentences_path sample_text/en.txt \
8
+ --output_path sample_text/en2es.translation.m2m100_12B.txt \
9
+ --source_lang en \
10
+ --target_lang es \
11
+ --model_name facebook/m2m100-12B-avg-5-ckpt \
12
+ --precision fp16
examples/mbart.sh ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run Mbart-many-to-many model on sample text.
2
+
3
+ cd ..
4
+
5
+ python3 translate.py \
6
+ --sentences_path sample_text/en.txt \
7
+ --output_path sample_text/en2es.translation.mbart.txt \
8
+ --source_lang en_XX \
9
+ --target_lang es_XX \
10
+ --model_name facebook/mbart-large-50-many-to-many-mmt
examples/nllb200-moe-54B_1GPU_4bits.sh ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run NLLB200-MOE model on sample text. This is a huge model that doesn't fit on a single GPU, so we use
2
+ # 4-bit quantization to reduce the required VRAM. Still it might not fit on a single GPU, so we also use
3
+ # the --force_auto_device_map flag that will offload the model parameters that don't fit on the GPU to the CPU.
4
+
5
+
6
+ cd ..
7
+
8
+ python3 translate.py \
9
+ --sentences_path sample_text/en.txt \
10
+ --output_path sample_text/en2es.translation.nllb200-moe-54B.txt \
11
+ --source_lang eng_Latn \
12
+ --target_lang spa_Latn \
13
+ --model_name facebook/nllb-moe-54b \
14
+ --precision 4 \
15
+ --force_auto_device_map \
16
+ --starting_batch_size 8
examples/nllb200-moe-54B_1GPU_8bits.sh ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run NLLB200-MOE model on sample text. This is a huge model that doesn't fit on a single GPU, so we use
2
+ # 8-bit quantization to reduce the required VRAM. Still it might not fit on a single GPU, so we also use
3
+ # the --force_auto_device_map flag that will offload the model parameters that don't fit on the GPU to the CPU.
4
+ # If 8-bit quantization is not enough, you can use 4-bit quantization, see examples/nllb200-moe-54B_1GPU_4bits.sh
5
+
6
+ cd ..
7
+
8
+ python3 translate.py \
9
+ --sentences_path sample_text/en.txt \
10
+ --output_path sample_text/en2es.translation.nllb200-moe-54B.txt \
11
+ --source_lang eng_Latn \
12
+ --target_lang spa_Latn \
13
+ --model_name facebook/nllb-moe-54b \
14
+ --precision 8 \
15
+ --force_auto_device_map \
16
+ --starting_batch_size 8
examples/nllb200_3B_8bit.sh ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run NLLB200-3B model on sample text. We use FP16 precision, which requires a GPU with a lot of VRAM
2
+ # For running this model in GPUs with less VRAM, use 8-bit quantization, see examples/nllb200_3B_8bit.sh
3
+
4
+ cd ..
5
+
6
+ python3 translate.py \
7
+ --sentences_path sample_text/en.txt \
8
+ --output_path sample_text/en2es.translation.nllb-200_3B.txt \
9
+ --source_lang eng_Latn \
10
+ --target_lang spa_Latn \
11
+ --model_name facebook/nllb-200-3.3B \
12
+ --precision 8
examples/nllb200_3B_fp16.sh ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run NLLB200-3B model on sample text. We use FP16 precision, which requires a GPU with a lot of VRAM
2
+ # For running this model in GPUs with less VRAM, use 8-bit quantization, see examples/nllb200_3B_8bit.sh
3
+
4
+ cd ..
5
+
6
+ python3 translate.py \
7
+ --sentences_path sample_text/en.txt \
8
+ --output_path sample_text/en2es.translation.nllb-200_3B.txt \
9
+ --source_lang eng_Latn \
10
+ --target_lang spa_Latn \
11
+ --model_name facebook/nllb-200-3.3B \
12
+ --precision fp16
examples/opusMT.sh ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ # Run OpusMT model on sample text.
2
+
3
+ cd ..
4
+
5
+ python3 translate.py \
6
+ --sentences_path sample_text/en.txt \
7
+ --output_path sample_text/en2es.translation.opus.txt \
8
+ --model_name Helsinki-NLP/opus-mt-es-en
examples/small100.sh ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ # Run SMALL100 model on sample text.
2
+
3
+ cd ..
4
+
5
+ python3 translate.py \
6
+ --sentences_path sample_text/en.txt \
7
+ --output_path sample_text/en2es.translation.small100.txt \
8
+ --source_lang en \
9
+ --target_lang es \
10
+ --model_name alirezamsh/small100