Spaces:

jesseplusplus
/

easy-translate

Running

App Files Files Community

Iker commited on May 4, 2022

Commit

b873cb9

1 Parent(s): 749ff6d

M2M100 with transformers and accelerate

Browse files

Files changed (9) hide show

README.md +102 -0
dataset.py +8 -24
sample_text/RADME.md +9 -0
sample_text/en.txt +0 -0
sample_text/en2es.translation.txt +0 -0
sample_text/es.txt +0 -0
supported_languages.md +104 -0
translate.py +146 -76
translate_troch2trt.py +0 -164

README.md CHANGED Viewed

	@@ -1 +1,103 @@
1	# Easy-Translate

 # Easy-Translate
+Easy-translate is a script for translating large text files in your machine using the [M2M100 models](https://arxiv.org/pdf/2010.11125.pdf) from Facebook/Meta AI.
+M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation.
+It was introduced in this [paper](https://arxiv.org/abs/2010.11125) and first released in [this](https://github.com/pytorch/fairseq/tree/master/examples/m2m_100) repository.
+The model that can directly translate between the 9,900 directions of 100 languages.
+Easy-Translate is built on top of 🤗HuggingFace's
+[Transformers](https://huggingface.co/docs/transformers/index) and
+🤗HuggingFace's [Accelerate](https://huggingface.co/docs/accelerate/index) library. We support:
+ * CPU / GPU / multi-GPU / TPU acceleration
+ * BF16 / FP16 / FB32 precision.
+ * Automatic batch size finder: Forget CUDA OOM errors. Set an initial batch size, if it doesn't fit, we will automatically adjust it.
+ * Sharded Data Parallel to load huge models sharded on multiple GPUs (See: https://huggingface.co/docs/accelerate/fsdp).
+Test the 🔌 Online Demo here: https://huggingface.co/spaces/Iker/Translate-100-languages
+## Supported languages
+See the [Supported languages table](supported_languages.md) for a table of the supported languages and their ids.
+**List of supported languages:**
+Afrikaans, Amharic, Arabic, Asturian, Azerbaijani, Bashkir, Belarusian, Bulgarian, Bengali, Breton, Bosnian, Catalan, Cebuano, Czech, Welsh, Danish, German, Greeek, English, Spanish, Estonian, Persian, Fulah, Finnish, French, WesternFrisian, Irish, Gaelic, Galician, Gujarati, Hausa, Hebrew, Hindi, Croatian, Haitian, Hungarian, Armenian, Indonesian, Igbo, Iloko, Icelandic, Italian, Japanese, Javanese, Georgian, Kazakh, CentralKhmer, Kannada, Korean, Luxembourgish, Ganda, Lingala, Lao, Lithuanian, Latvian, Malagasy, Macedonian, Malayalam, Mongolian, Marathi, Malay, Burmese, Nepali, Dutch, Norwegian, NorthernSotho, Occitan, Oriya, Panjabi, Polish, Pushto, Portuguese, Romanian, Russian, Sindhi, Sinhala, Slovak, Slovenian, Somali, Albanian, Serbian, Swati, Sundanese, Swedish, Swahili, Tamil, Thai, Tagalog, Tswana, Turkish, Ukrainian, Urdu, Uzbek, Vietnamese, Wolof, Xhosa, Yiddish, Yoruba, Chinese, Zulu
+## Supported Models
+ * **Facebook/m2m100_418M**: https://huggingface.co/facebook/m2m100_418M
+ * **Facebook/m2m100_1.2B**: https://huggingface.co/facebook/m2m100_1.2B
+ * **Facebook/m2m100_12B**: https://huggingface.co/facebook/m2m100-12B-avg-5-ckpt
+ * Any other m2m100 model from HuggingFace's Hub: https://huggingface.co/models?search=m2m100
+## Requirements:
+```
+Pytorch >= 1.10.0
+See: https://pytorch.org/get-started/locally/
+Accelerate >= 0.7.1
+pip install --upgrade accelerate
+HuggingFace Transformers
+pip install --upgrade transformers
+```
+## Translate a file
+Run `python translate.py -h` for more info.
+#### Using a single CPU / GPU:
+```bash
+accelerate launch translate.py \
+--sentences_path sample_text/en.txt \
+--output_path sample_text/en2es.translation.txt \
+--source_lang en \
+--target_lang es \
+--model_name facebook/m2m100_1.2B
+```
+#### Multi-GPU:
+See Accelerate documentation for more information (multi-node, TPU, Sharded model...): https://huggingface.co/docs/accelerate/index
+You can use the Accelerate CLI to configure the Accelerate environment (Run
+`accelerate config` in your terminal) instead of using the
+`--multi_gpu and --num_processes` flags.
+```bash
+accelerate launch --multi_gpu --num_processes 2 --num_machines 1 translate.py \
+--sentences_path sample_text/en.txt \
+--output_path sample_text/en2es.translation.txt \
+--source_lang en \
+--target_lang es \
+--model_name facebook/m2m100_1.2B
+```
+#### Automatic batch size finder:
+We will automatically find a batch size that fits in your GPU memory.
+The default initial batch size is 128 (You can set it with the `--starting_batch_size 128` flag).
+If we find an Out Of Memory error, we will automatically decrease the batch size until we find a working one.
+#### Choose precision:
+Use the `--precision` flag to choose the precision of the model. You can choose between: bf16, fp16 and 32.
+```bash
+accelerate launch translate.py \
+--sentences_path sample_text/en.txt \
+--output_path sample_text/en2es.translation.txt \
+--source_lang en \
+--target_lang es \
+--model_name facebook/m2m100_1.2B \
+--precision fp16
+```
+## Evaluate translations
+Work in progress...

dataset.py CHANGED Viewed

@@ -1,7 +1,4 @@
-from typing import List, TextIO, Dict, Optional
-import torch
 from torch.utils.data import IterableDataset
-from torch.utils.data.dataset import T_co
 def blocks(files, size=65536):
@@ -22,35 +19,22 @@ class DatasetReader(IterableDataset):
         self.filename = filename
         self.tokenizer = tokenizer
         self.max_length = max_length
     def preprocess(self, text: str):
         return self.tokenizer(
-            text.rstrip().strip(),
-            padding="max_length",
             truncation=True,
             max_length=self.max_length,
-            return_tensors="pt",
         )
     def __iter__(self):
         file_itr = open(self.filename, "r")
         mapped_itr = map(self.preprocess, file_itr)
         return mapped_itr
-def collate_function(batch: List[T_co]) -> Dict[str, torch.Tensor]:
-    return {
-        "input_ids": torch.stack([item["input_ids"][0] for item in batch]),
-        "attention_mask": torch.stack([item["attention_mask"][0] for item in batch]),
-    }
-def get_dataloader(
-    filename: str, tokenizer: str, batch_size: int, max_length: int
-) -> torch.utils.data.DataLoader:
-    dataset = DatasetReader(filename, tokenizer, max_length)
-    return torch.utils.data.DataLoader(
-        dataset,
-        batch_size=batch_size,
-        collate_fn=collate_function,
-    )

 from torch.utils.data import IterableDataset
 def blocks(files, size=65536):
         self.filename = filename
         self.tokenizer = tokenizer
         self.max_length = max_length
+        self.current_line = 0
     def preprocess(self, text: str):
+        self.current_line += 1
+        text = text.rstrip().strip()
+        if len(text) == 0:
+            print(f"Warning: empty sentence at line {self.current_line}")
         return self.tokenizer(
+            text,
+            padding=False,
             truncation=True,
             max_length=self.max_length,
+            return_tensors=None,
         )
     def __iter__(self):
         file_itr = open(self.filename, "r")
         mapped_itr = map(self.preprocess, file_itr)
         return mapped_itr

sample_text/RADME.md ADDED Viewed

	@@ -0,0 +1,9 @@

+# Sample texts
+We provide a few parallel sentences for easy debugging and testing.
+Data has been extracted from the europarl v7 corpus: [http://www.statmt.org/europarl/v7/](http://www.statmt.org/europarl/v7/).
+* **en.txt**: 1000 English sentences
+* **es.txt**: 1000 Spanish sentences
+Sentences in both files are parallel.

sample_text/en.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

sample_text/en2es.translation.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

sample_text/es.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

supported_languages.md ADDED Viewed

	@@ -0,0 +1,104 @@

+## Supported languages
+| Language | Id |
+|---|---|
+| Afrikaans | af |
+| Amharic | am |
+| Arabic | ar |
+| Asturian | ast |
+| Azerbaijani | az |
+| Bashkir | ba |
+| Belarusian | be |
+| Bulgarian | bg |
+| Bengali | bn |
+| Breton | br |
+| Bosnian | bs |
+| Catalan | ca |
+| Cebuano | ceb |
+| Czech | cs |
+| Welsh | cy |
+| Danish | da |
+| German | de |
+| Greeek | el |
+| English | en |
+| Spanish | es |
+| Estonian | et |
+| Persian | fa |
+| Fulah | ff |
+| Finnish | fi |
+| French | fr |
+| WesternFrisian | fy |
+| Irish | ga |
+| Gaelic | gd |
+| Galician | gl |
+| Gujarati | gu |
+| Hausa | ha |
+| Hebrew | he |
+| Hindi | hi |
+| Croatian | hr |
+| Haitian | ht |
+| Hungarian | hu |
+| Armenian | hy |
+| Indonesian | id |
+| Igbo | ig |
+| Iloko | ilo |
+| Icelandic | is |
+| Italian | it |
+| Japanese | ja |
+| Javanese | jv |
+| Georgian | ka |
+| Kazakh | kk |
+| CentralKhmer | km |
+| Kannada | kn |
+| Korean | ko |
+| Luxembourgish | lb |
+| Ganda | lg |
+| Lingala | ln |
+| Lao | lo |
+| Lithuanian | lt |
+| Latvian | lv |
+| Malagasy | mg |
+| Macedonian | mk |
+| Malayalam | ml |
+| Mongolian | mn |
+| Marathi | mr |
+| Malay | ms |
+| Burmese | my |
+| Nepali | ne |
+| Dutch | nl |
+| Norwegian | no |
+| NorthernSotho | ns |
+| Occitan | oc |
+| Oriya | or |
+| Panjabi | pa |
+| Polish | pl |
+| Pushto | ps |
+| Portuguese | pt |
+| Romanian | ro |
+| Russian | ru |
+| Sindhi | sd |
+| Sinhala | si |
+| Slovak | sk |
+| Slovenian | sl |
+| Somali | so |
+| Albanian | sq |
+| Serbian | sr |
+| Swati | ss |
+| Sundanese | su |
+| Swedish | sv |
+| Swahili | sw |
+| Tamil | ta |
+| Thai | th |
+| Tagalog | tl |
+| Tswana | tn |
+| Turkish | tr |
+| Ukrainian | uk |
+| Urdu | ur |
+| Uzbek | uz |
+| Vietnamese | vi |
+| Wolof | wo |
+| Xhosa | xh |
+| Yiddish | yi |
+| Yoruba | yo |
+| Chinese | zh |
+| Zulu | zu |

translate.py CHANGED Viewed

@@ -1,99 +1,152 @@
-from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
 from tqdm import tqdm
-from typing import TextIO, List
 import argparse
 import torch
-from dataset import get_dataloader, count_lines
 import os
 def main(
-    sentences_path,
-    output_path,
-    source_lang,
-    target_lang,
-    batch_size,
     model_name: str = "facebook/m2m100_1.2B",
-    tensorrt: bool = False,
-    precision: int = 32,
     max_length: int = 128,
 ):
     if not os.path.exists(os.path.dirname(output_path)):
         os.makedirs(os.path.dirname(output_path))
     print("Loading tokenizer...")
-    tokenizer = M2M100Tokenizer.from_pretrained(model_name)
     print("Loading model...")
-    model = M2M100ForConditionalGeneration.from_pretrained(model_name)
-    print(f"Model loaded.\n")
     tokenizer.src_lang = source_lang
     lang_code_to_idx = tokenizer.lang_code_to_id[target_lang]
-    model.eval()
     total_lines: int = count_lines(sentences_path)
-    print(f"We will translate {total_lines} lines.")
-    data_loader = get_dataloader(
-        filename=sentences_path,
-        tokenizer=tokenizer,
-        batch_size=batch_size,
-        max_length=128,
     )
-    if precision == 16:
-        dtype = torch.float16
-    elif precision == 32:
-        dtype = torch.float32
-    elif precision == 64:
-        dtype = torch.float64
-    else:
-        raise ValueError("Precision must be 16, 32 or 64.")
-    if tensorrt:
-        import torch_tensorrt
-        device = "cuda"
-        model.to(device)
-        traced_model = torch.jit.trace(
-            model, [torch.randn((batch_size, max_length)).to("cuda", dtype=torch.long)]
-        )
-        model = torch_tensorrt.compile(
-            traced_model,
-            inputs=[torch_tensorrt.Input((batch_size, max_length), dtype=torch.long)],
-            enabled_precisions={dtype},
-        )
-    else:
-        if torch.cuda.is_available():
-            device = "cuda"
-        else:
-            device = "cpu"
-            print("CUDA not available. Using CPU. This will be slow.")
-        model.to(device, dtype=dtype)
-    with tqdm(total=total_lines, desc="Dataset translation") as pbar, open(
-        output_path, "w+", encoding="utf-8"
-    ) as output_file:
-        with torch.no_grad():
-            for batch in data_loader:
-                batch["input_ids"] = batch["input_ids"].to(device)
-                batch["attention_mask"] = batch["attention_mask"].to(device)
-                generated_tokens = model.generate(
-                    **batch, forced_bos_token_id=lang_code_to_idx
-                )
-                tgt_text = tokenizer.batch_decode(
-                    generated_tokens.cpu(), skip_special_tokens=True
-                )
-                print("\n".join(tgt_text), file=output_file)
-                pbar.update(len(tgt_text))
     print(f"Translation done.\n")
@@ -117,21 +170,21 @@ if __name__ == "__main__":
         "--source_lang",
         type=str,
         required=True,
-        help="Source language id. See: https://huggingface.co/facebook/m2m100_1.2B",
     )
     parser.add_argument(
         "--target_lang",
         type=str,
         required=True,
-        help="Target language id. See: https://huggingface.co/facebook/m2m100_1.2B",
     )
     parser.add_argument(
-        "--batch_size",
         type=int,
-        default=8,
-        help="Batch size",
     )
     parser.add_argument(
@@ -142,17 +195,33 @@ if __name__ == "__main__":
     )
     parser.add_argument(
-        "--precision",
         type=int,
-        default=32,
-        choices=[16, 32, 64],
-        help="Precision of the model. 16, 32 or 64.",
     )
     parser.add_argument(
-        "--tensorrt",
-        action="store_true",
-        help="Use TensorRT to compile the model.",
     )
     args = parser.parse_args()
@@ -162,8 +231,9 @@ if __name__ == "__main__":
         output_path=args.output_path,
         source_lang=args.source_lang,
         target_lang=args.target_lang,
-        batch_size=args.batch_size,
         model_name=args.model_name,
         precision=args.precision,
-        tensorrt=args.tensorrt,
     )

+from transformers import (
+    M2M100ForConditionalGeneration,
+    M2M100Tokenizer,
+    PreTrainedTokenizerBase,
+    DataCollatorForSeq2Seq,
+)
 from tqdm import tqdm
 import argparse
 import torch
+from torch.utils.data import DataLoader
+from dataset import DatasetReader, count_lines
 import os
+from accelerate import Accelerator, DistributedType
+from accelerate.memory_utils import find_executable_batch_size
+def get_dataloader(
+    accelerator: Accelerator,
+    filename: str,
+    tokenizer: PreTrainedTokenizerBase,
+    batch_size: int,
+    max_length: int,
+) -> DataLoader:
+    dataset = DatasetReader(filename, tokenizer, max_length)
+    if accelerator.distributed_type == DistributedType.TPU:
+        data_collator = DataCollatorForSeq2Seq(
+            tokenizer,
+            padding="max_length",
+            max_length=max_length,
+            label_pad_token_id=tokenizer.pad_token_id,
+            return_tensors="pt",
+        )
+    else:
+        data_collator = DataCollatorForSeq2Seq(
+            tokenizer,
+            padding=True,
+            label_pad_token_id=tokenizer.pad_token_id,
+            # max_length=max_length, No need to set max_length here, we already truncate in the preprocess function
+            pad_to_multiple_of=8,
+            return_tensors="pt",
+        )
+    return DataLoader(
+        dataset,
+        batch_size=batch_size,
+        collate_fn=data_collator,
+    )
 def main(
+    sentences_path: str,
+    output_path: str,
+    source_lang: str,
+    target_lang: str,
+    starting_batch_size: int,
     model_name: str = "facebook/m2m100_1.2B",
+    cache_dir: str = None,
+    precision: str = "32",
     max_length: int = 128,
+    num_beams: int = 4,
 ):
     if not os.path.exists(os.path.dirname(output_path)):
         os.makedirs(os.path.dirname(output_path))
+    accelerator = Accelerator(mixed_precision=precision if precision != "32" else "no")
     print("Loading tokenizer...")
+    tokenizer = M2M100Tokenizer.from_pretrained(
+        pretrained_model_name_or_path=model_name, cache_dir=cache_dir
+    )
     print("Loading model...")
+    model = M2M100ForConditionalGeneration.from_pretrained(
+        pretrained_model_name_or_path=model_name, cache_dir=cache_dir
+    )
+    model.eval()
+    print(f"Preparing data...\n")
+    if precision == "32":
+        model = model.float()
+    elif precision == "fp16":
+        model = model.half()
+    elif precision == "bf16":
+        model = model.bfloat16()
+    else:
+        raise ValueError("Precision not supported. Supported values: 32, fp16, bf16")
     tokenizer.src_lang = source_lang
     lang_code_to_idx = tokenizer.lang_code_to_id[target_lang]
+    gen_kwargs = {
+        "max_length": max_length,
+        "num_beams": num_beams,
+        "num_return_sequences": 1,
+    }
     total_lines: int = count_lines(sentences_path)
+    print(
+        f"We will translate {total_lines} lines. Initial batch size: {starting_batch_size}"
     )
+    @find_executable_batch_size(starting_batch_size=starting_batch_size)
+    def inference(batch_size):
+        nonlocal model, tokenizer, sentences_path, max_length, output_path, lang_code_to_idx, gen_kwargs, total_lines, precision
+        print(f"Translating with batch size {batch_size}")
+        data_loader = get_dataloader(
+            accelerator=accelerator,
+            filename=sentences_path,
+            tokenizer=tokenizer,
+            batch_size=batch_size,
+            max_length=max_length,
+        )
+        model, data_loader = accelerator.prepare(model, data_loader)
+        with tqdm(
+            total=total_lines, desc="Dataset translation", leave=True, ascii=True
+        ) as pbar, open(output_path, "w", encoding="utf-8") as output_file:
+            with torch.no_grad():
+                for batch in data_loader:
+                    batch["input_ids"] = batch["input_ids"]
+                    batch["attention_mask"] = batch["attention_mask"]
+                    generated_tokens = accelerator.unwrap_model(model).generate(
+                        **batch, forced_bos_token_id=lang_code_to_idx, **gen_kwargs
+                    )
+                    generated_tokens = accelerator.pad_across_processes(
+                        generated_tokens, dim=1, pad_index=tokenizer.pad_token_id
+                    )
+                    generated_tokens = (
+                        accelerator.gather(generated_tokens).cpu().numpy()
+                    )
+                    tgt_text = tokenizer.batch_decode(
+                        generated_tokens, skip_special_tokens=True
+                    )
+                    print("\n".join(tgt_text), file=output_file)
+                    pbar.update(len(tgt_text))
+    inference()
     print(f"Translation done.\n")
         "--source_lang",
         type=str,
         required=True,
+        help="Source language id. See: supported_languages.md",
     )
     parser.add_argument(
         "--target_lang",
         type=str,
         required=True,
+        help="Target language id. See: supported_languages.md",
     )
     parser.add_argument(
+        "--starting_batch_size",
         type=int,
+        default=128,
+        help="Starting batch size, we will automatically reduce it if we find an OOM error.",
     )
     parser.add_argument(
     )
     parser.add_argument(
+        "--cache_dir",
+        type=str,
+        default=None,
+        help="Cache directory from which to load the model, or None to not cache",
+    )
+    parser.add_argument(
+        "--max_length",
+        type=int,
+        default=128,
+        help="Maximum number of tokens in the source sentence and generated sentence. "
+        "Increase this value to translate longer sentences, at the cost of increasing memory usage.",
+    )
+    parser.add_argument(
+        "--num_beams",
         type=int,
+        default=5,
+        help="Number of beams for beam search, m2m10 author recommends 5, but it might use too much memory",
     )
     parser.add_argument(
+        "--precision",
+        type=str,
+        default="32",
+        choices=["bf16", "fp16", "32"],
+        help="Precision of the model. bf16, fp16 or 32.",
     )
     args = parser.parse_args()
         output_path=args.output_path,
         source_lang=args.source_lang,
         target_lang=args.target_lang,
+        starting_batch_size=args.starting_batch_size,
         model_name=args.model_name,
+        cache_dir=args.cache_dir,
+        num_beams=args.num_beams,
         precision=args.precision,
     )

translate_troch2trt.py DELETED Viewed

@@ -1,164 +0,0 @@
-from transformers import M2M100ForConditionalGeneration, M2M100Tokenizer
-from tqdm import tqdm
-from typing import TextIO, List
-import argparse
-import torch
-from dataset import get_dataloader, count_lines
-import os
-def main(
-    sentences_path,
-    output_path,
-    source_lang,
-    target_lang,
-    batch_size,
-    model_name: str = "facebook/m2m100_1.2B",
-    tensorrt: bool = False,
-    precision: int = 32,
-    max_length: int = 128,
-):
-    if not os.path.exists(os.path.dirname(output_path)):
-        os.makedirs(os.path.dirname(output_path))
-    print("Loading tokenizer...")
-    tokenizer = M2M100Tokenizer.from_pretrained(model_name)
-    print("Loading model...")
-    model = M2M100ForConditionalGeneration.from_pretrained(model_name)
-    print(f"Model loaded.\n")
-    tokenizer.src_lang = source_lang
-    lang_code_to_idx = tokenizer.lang_code_to_id[target_lang]
-    model.eval()
-    total_lines: int = count_lines(sentences_path)
-    print(f"We will translate {total_lines} lines.")
-    data_loader = get_dataloader(
-        filename=sentences_path,
-        tokenizer=tokenizer,
-        batch_size=batch_size,
-        max_length=128,
-    )
-    if precision == 16:
-        dtype = torch.float16
-    elif precision == 32:
-        dtype = torch.float32
-    elif precision == 64:
-        dtype = torch.float64
-    else:
-        raise ValueError("Precision must be 16, 32 or 64.")
-    if tensorrt:
-        device = "cuda"
-        from torch2trt import torch2trt
-        model.to(device, dtype=dtype)
-        model = torch2trt(
-            model,
-            [torch.randn((batch_size, max_length)).to(device, dtype=torch.long)],
-        )
-    else:
-        if torch.cuda.is_available():
-            device = "cuda"
-        else:
-            device = "cpu"
-            print("CUDA not available. Using CPU. This will be slow.")
-        model.to(device, dtype=dtype)
-    with tqdm(total=total_lines, desc="Dataset translation") as pbar, open(
-        output_path, "w+", encoding="utf-8"
-    ) as output_file:
-        with torch.no_grad():
-            for batch in data_loader:
-                batch["input_ids"] = batch["input_ids"].to(device)
-                batch["attention_mask"] = batch["attention_mask"].to(device)
-                generated_tokens = model.generate(
-                    **batch, forced_bos_token_id=lang_code_to_idx
-                )
-                tgt_text = tokenizer.batch_decode(
-                    generated_tokens.cpu(), skip_special_tokens=True
-                )
-                print("\n".join(tgt_text), file=output_file)
-                pbar.update(len(tgt_text))
-    print(f"Translation done.\n")
-if __name__ == "__main__":
-    parser = argparse.ArgumentParser(description="Run the translation experiments")
-    parser.add_argument(
-        "--sentences_path",
-        type=str,
-        required=True,
-        help="Path to a txt file containing the sentences to translate. One sentence per line.",
-    )
-    parser.add_argument(
-        "--output_path",
-        type=str,
-        required=True,
-        help="Path to a txt file where the translated sentences will be written.",
-    )
-    parser.add_argument(
-        "--source_lang",
-        type=str,
-        required=True,
-        help="Source language id. See: https://huggingface.co/facebook/m2m100_1.2B",
-    )
-    parser.add_argument(
-        "--target_lang",
-        type=str,
-        required=True,
-        help="Target language id. See: https://huggingface.co/facebook/m2m100_1.2B",
-    )
-    parser.add_argument(
-        "--batch_size",
-        type=int,
-        default=8,
-        help="Batch size",
-    )
-    parser.add_argument(
-        "--model_name",
-        type=str,
-        default="facebook/m2m100_1.2B",
-        help="Path to the model to use. See: https://huggingface.co/models",
-    )
-    parser.add_argument(
-        "--precision",
-        type=int,
-        default=32,
-        choices=[16, 32, 64],
-        help="Precision of the model. 16, 32 or 64.",
-    )
-    parser.add_argument(
-        "--tensorrt",
-        action="store_true",
-        help="Use TensorRT to compile the model.",
-    )
-    args = parser.parse_args()
-    main(
-        sentences_path=args.sentences_path,
-        output_path=args.output_path,
-        source_lang=args.source_lang,
-        target_lang=args.target_lang,
-        batch_size=args.batch_size,
-        model_name=args.model_name,
-        precision=args.precision,
-        tensorrt=args.tensorrt,
-    )