gaudi commited on
Commit
d7ea3bd
1 Parent(s): da3f7f7

Initial Commit

Browse files
.gitattributes CHANGED
@@ -1,35 +1,8 @@
1
- *.7z filter=lfs diff=lfs merge=lfs -text
2
- *.arrow filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
4
- *.bz2 filter=lfs diff=lfs merge=lfs -text
5
- *.ckpt filter=lfs diff=lfs merge=lfs -text
6
- *.ftz filter=lfs diff=lfs merge=lfs -text
7
- *.gz filter=lfs diff=lfs merge=lfs -text
8
  *.h5 filter=lfs diff=lfs merge=lfs -text
9
- *.joblib filter=lfs diff=lfs merge=lfs -text
10
- *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
- *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
- *.model filter=lfs diff=lfs merge=lfs -text
13
- *.msgpack filter=lfs diff=lfs merge=lfs -text
14
- *.npy filter=lfs diff=lfs merge=lfs -text
15
- *.npz filter=lfs diff=lfs merge=lfs -text
16
- *.onnx filter=lfs diff=lfs merge=lfs -text
17
- *.ot filter=lfs diff=lfs merge=lfs -text
18
- *.parquet filter=lfs diff=lfs merge=lfs -text
19
- *.pb filter=lfs diff=lfs merge=lfs -text
20
- *.pickle filter=lfs diff=lfs merge=lfs -text
21
- *.pkl filter=lfs diff=lfs merge=lfs -text
22
- *.pt filter=lfs diff=lfs merge=lfs -text
23
- *.pth filter=lfs diff=lfs merge=lfs -text
24
- *.rar filter=lfs diff=lfs merge=lfs -text
25
- *.safetensors filter=lfs diff=lfs merge=lfs -text
26
- saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
- *.tar.* filter=lfs diff=lfs merge=lfs -text
28
- *.tar filter=lfs diff=lfs merge=lfs -text
29
  *.tflite filter=lfs diff=lfs merge=lfs -text
30
- *.tgz filter=lfs diff=lfs merge=lfs -text
31
- *.wasm filter=lfs diff=lfs merge=lfs -text
32
- *.xz filter=lfs diff=lfs merge=lfs -text
33
- *.zip filter=lfs diff=lfs merge=lfs -text
34
- *.zst filter=lfs diff=lfs merge=lfs -text
35
- *tfevents* filter=lfs diff=lfs merge=lfs -text
 
1
+ *.bin.* filter=lfs diff=lfs merge=lfs -text
2
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
3
  *.bin filter=lfs diff=lfs merge=lfs -text
 
 
 
 
4
  *.h5 filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
5
  *.tflite filter=lfs diff=lfs merge=lfs -text
6
+ *.tar.gz filter=lfs diff=lfs merge=lfs -text
7
+ *.ot filter=lfs diff=lfs merge=lfs -text
8
+ *.onnx filter=lfs diff=lfs merge=lfs -text
 
 
 
README.md ADDED
@@ -0,0 +1,112 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ ---
3
+ tags:
4
+ - ctranslate2
5
+ - translation
6
+ license: apache-2.0
7
+ ---
8
+ # Repository General Information
9
+ ## Inspired by and derived from the work of [Helsinki-NLP](https://huggingface.co/Helsinki-NLP), [CTranslate2](https://github.com/OpenNMT/CTranslate2), and [michaelfeil](https://huggingface.co/michaelfeil)!
10
+ - Link to Original Model ([Helsinki-NLP](https://huggingface.co/Helsinki-NLP)): [Model Link](https://huggingface.co/Helsinki-NLP/opus-mt-gem-en)
11
+ - This respository was based on the work of [CTranslate2](https://github.com/OpenNMT/CTranslate2).
12
+ - This repository was based on the work of [michaelfeil](https://huggingface.co/michaelfeil).
13
+
14
+ # What is CTranslate2?
15
+ [CTranslate2](https://opennmt.net/CTranslate2/) is a C++ and Python library for efficient inference with Transformer models.
16
+
17
+ CTranslate2 implements a custom runtime that applies many performance optimization techniques such as weights quantization, layers fusion, batch reordering, etc., to accelerate and reduce the memory usage of Transformer models on CPU and GPU.
18
+
19
+ CTranslate2 is SOTA and is one of the most performant ways of hosting translation models at scale. Current supported models include:
20
+ - Encoder-decoder models: Transformer base/big, M2M-100, NLLB, BART, mBART, Pegasus, T5, Whisper
21
+ - Decoder-only models: GPT-2, GPT-J, GPT-NeoX, OPT, BLOOM, MPT, Llama, Mistral, Gemma, CodeGen, GPTBigCode, Falcon
22
+ - Encoder-only models: BERT, DistilBERT, XLM-RoBERTa
23
+
24
+ Speed up inference times by about **2x-8x** using **int8** inference in C++. CTranslate2 is SOTA for hosting translation models at scale.
25
+ # CTranslate2 Benchmarks
26
+ Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings. Tested against `newstest2014` (En -> De) dataset.
27
+
28
+ The benchmark reports the number of target tokens generated per second (higher is better). The results are aggregated over multiple runs. See the benchmark scripts for more details and reproduce these numbers.
29
+
30
+ Please note that the results presented below are only valid for the configuration used during this benchmark: absolute and relative performance may change with different settings.
31
+
32
+ ## CPU Benchmarks for Generic Opus-MT Models
33
+ | Library | Tokens per Second | Max Memory Usage | BLEU |
34
+ | :----: | :----: | :----: | :----: |
35
+ | Transformers 4.26.1 (with PyTorch 1.13.1) | 147.3 | 2332MB | 27.90 |
36
+ | Marian 1.11.0 (int16) | 330.2 | 5901MB | 27.65 |
37
+ | Marian 1.11.0 (int8) | 355.8 | 4763MB | 27.27 |
38
+ | CTranslate2 3.6.0 (int16) | 596.1 | 660MB | 27.53 |
39
+ | CTranslate2 3.6.0 (int8) | 696.1 | 516MB | 27.65 |
40
+
41
+ ## GPU Benchmarks for Generic Opus-MT Models
42
+ | Library | Tokens per Second | Max GPU Memory Usage | Max Memory Usage | BLEU |
43
+ | :----: | :----: | :----: | :----: | :----: |
44
+ | Transformers 4.26.1 (with PyTorch 1.13.1) | 1022.9 | 4097MB | 2109MB | 27.90 |
45
+ | Marian 1.11.0 (float16) | 3962.4 | 3239MB | 1976MB | 27.94 |
46
+ | CTranslate2 3.6.0 (float16) | 9296.7 | 909MB | 814MB | 27.9 |
47
+ | CTranslate2 3.6.0 (int8 + float16) | 8362.7 | 813MB | 766MB | 27.9 |
48
+
49
+ `Executed with 4 threads on a c5.2xlarge Amazon EC2 instance equipped with an Intel(R) Xeon(R) Platinum 8275CL CPU.`
50
+
51
+ **Source to benchmark information can be found [here](https://github.com/OpenNMT/CTranslate2).**<br />
52
+ **Original model BLEU scores can be found [here](https://huggingface.co/Helsinki-NLP/opus-mt-gem-en).**
53
+
54
+ # CTranslate2 Installation
55
+ ```bash
56
+ pip install hf-hub-ctranslate2>=1.0.0 ctranslate2>=3.13.0
57
+ ```
58
+ ### ct2-transformers-converter Command Used:
59
+ ```
60
+ ct2-transformers-converter --model Helsinki-NLP/opus-mt-gem-en --output_dir ./ctranslate2/opus-mt-gem-en-ctranslate2 --force --copy_files README.md generation_config.json tokenizer_config.json vocab.json source.spm .gitattributes target.spm --quantization float16
61
+ ```
62
+ # CTranslate2 Converted Checkpoint Information:
63
+ **Compatible With:**
64
+ - [ctranslate2](https://github.com/OpenNMT/CTranslate2)
65
+ - [hf-hub-ctranslate2](https://github.com/michaelfeil/hf-hub-ctranslate2)
66
+
67
+ **Compute Type:**
68
+ - `compute_type=int8_float16` for `device="cuda"`
69
+ - `compute_type=int8` for `device="cpu"`
70
+
71
+ # Sample Code - ctranslate2
72
+ ```python
73
+ from ctranslate2 import Translator
74
+ import transformers
75
+
76
+ model_name = "gaudi/opus-mt-gem-en-ctranslate2"
77
+ translator = Translator(
78
+ model_path=model_name,
79
+ device="cuda",
80
+ inter_threads=1,
81
+ intra_threads=4,
82
+ compute_type="int8_float16",
83
+ )
84
+
85
+ tokenizer = transformers.AutoTokenizer.from_pretrained(model_name)
86
+
87
+ source = tokenizer.convert_ids_to_tokens(tokenizer.encode("XXXXXX, XXX XX XXXXXX."))
88
+ results = translator.translate_batch([source])
89
+ target = results[0].hypotheses[0]
90
+
91
+ print(tokenizer.decode(tokenizer.convert_tokens_to_ids(target)))
92
+ ```
93
+ # Sample Code - hf-hub-ctranslate2
94
+ **Derived From [michaelfeil](https://huggingface.co/michaelfeil):**
95
+ ```python
96
+ from hf_hub_ctranslate2 import TranslatorCT2fromHfHub, GeneratorCT2fromHfHub
97
+ from transformers import AutoTokenizer
98
+
99
+ model_name = "gaudi/opus-mt-gem-en-ctranslate2"
100
+ model = TranslatorCT2fromHfHub(
101
+ model_name_or_path=model_name,
102
+ device="cuda",
103
+ compute_type="int8_float16" # load in int8 on CUDA,
104
+ tokenizer=AutoTokenizer.from_pretrained(model_name)
105
+ )
106
+ outputs = model.generate(
107
+ text=["XXX XX XXX XXXXXXX XXXX?", "XX XX XXXX XX XXX!"],
108
+ )
109
+ print(outputs)
110
+ ```
111
+ # License and other remarks:
112
+ License conditions are intended to be idential to [original huggingface repository](https://huggingface.co/Helsinki-NLP/opus-mt-gem-en) by Helsinki-NLP.
config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_source_bos": false,
3
+ "add_source_eos": false,
4
+ "bos_token": "<s>",
5
+ "decoder_start_token": "</s>",
6
+ "eos_token": "</s>",
7
+ "layer_norm_epsilon": null,
8
+ "unk_token": "<unk>",
9
+ "model_type": "marian"
10
+ }
generation_config.json ADDED
@@ -0,0 +1,16 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bad_words_ids": [
3
+ [
4
+ 56646
5
+ ]
6
+ ],
7
+ "bos_token_id": 0,
8
+ "decoder_start_token_id": 56646,
9
+ "eos_token_id": 0,
10
+ "forced_eos_token_id": 0,
11
+ "max_length": 512,
12
+ "num_beams": 4,
13
+ "pad_token_id": 56646,
14
+ "renormalize_logits": true,
15
+ "transformers_version": "4.32.0.dev0"
16
+ }
model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a19d22a94f190849feac6239fc558df4e5e2e66e457feae4a7648d0297799ff6
3
+ size 146931343
shared_vocabulary.json ADDED
The diff for this file is too large to render. See raw diff
 
source.spm ADDED
Binary file (790 kB). View file
 
target.spm ADDED
Binary file (784 kB). View file
 
tokenizer_config.json ADDED
@@ -0,0 +1 @@
 
 
1
+ {"target_lang": "eng", "source_lang": "gem"}
vocab.json ADDED
The diff for this file is too large to render. See raw diff