stelterlab commited on
Commit
e5a6cb3
1 Parent(s): 419f12f

Upload README.md

Browse files

added quantization info to model card

Files changed (1) hide show
  1. README.md +374 -3
README.md CHANGED
@@ -1,3 +1,374 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - de
4
+ - bg
5
+ - cs
6
+ - da
7
+ - el
8
+ - en
9
+ - es
10
+ - et
11
+ - fi
12
+ - fr
13
+ - ga
14
+ - hr
15
+ - hu
16
+ - it
17
+ - lt
18
+ - lv
19
+ - mt
20
+ - nl
21
+ - pl
22
+ - pt
23
+ - ro
24
+ - sl
25
+ - sv
26
+ - sk
27
+ metrics:
28
+ - accuracy
29
+ - bleu
30
+ pipeline_tag: text-generation
31
+ library_name: transformers
32
+ base_model:
33
+ - openGPT-X/Teuken-7B-base-v0.4
34
+ license: apache-2.0
35
+ ---
36
+ AWQ quantization: done by stelterlab in INT4 GEMM with AutoAWQ by casper-hansen (https://github.com/casper-hansen/AutoAWQ/)
37
+
38
+ Original Weights by VAGOsolutions. Original Model Card follows:
39
+
40
+ # Model Card for Teuken-7B-instruct-commercial-v0.4
41
+
42
+
43
+ [Teuken-7B-instruct-commercial-v0.4](https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4) is an instruction-tuned 7B parameter multilingual large language model (LLM) pre-trained with 4T tokens in all official 24 European languages and released under Apache 2.0 in the research project [OpenGPT-X](https://opengpt-x.de).
44
+ The base model Teuken-7B-base-v0.4 is available on request 📧 <a href="contact@opengpt-x.de">contact@opengpt-x.de</a>.
45
+
46
+ ### Model Description
47
+
48
+ <!-- Provide a longer summary of what this model is. -->
49
+
50
+ - **Developed by:** Fraunhofer, Forschungszentrum Jülich, TU Dresden, DFKI
51
+ - **Funded by:** German Federal Ministry of Economics and Climate Protection (BMWK) in the context of the OpenGPT-X project
52
+ - **Model type:** Transformer based decoder-only model
53
+ - **Language(s) (NLP):** bg, cs, da, de, el, en, es, et, fi, fr, ga, hr, hu, it, lt, lv, mt, nl, pl, pt, ro, sk, sl, sv
54
+ - **Shared by:** OpenGPT-X
55
+
56
+ ## Uses
57
+
58
+ <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
59
+ [Teuken-7B-instruct-commercial-v0.4](https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4) is intended for commercial and research use in all official 24 European languages. Since [Teuken-7B-instruct-commercial-v0.4](https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4) focuses on covering all 24 EU languages, it renders more stable results across these languages and better reflects European values in its answers than English-centric models. It is therefore specialized for use in multilingual tasks.
60
+
61
+ ## Disclaimer Toxic Content:
62
+
63
+ This Large Language Model (LLM) may generate content that is inappropriate, offensive, or harmful. While the dataset has been filtered to minimize such outputs, the model may still produce text that is biased or toxic due to the large scale and diverse nature of the data.
64
+
65
+
66
+ ### Out-of-Scope Use
67
+
68
+ <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
69
+
70
+ The model is not intended for use in math and coding tasks.
71
+
72
+ ## Bias, Risks, and Limitations
73
+
74
+ <!-- This section is meant to convey both technical and sociotechnical limitations. -->
75
+
76
+ [Teuken-7B-instruct-commercial-v0.4](https://huggingface.co/openGPT-X/Teuken-7B-instruct-commercial-v0.4) is an instruction-tuned version of Teuken-7B-base-v0.4 (which is available on request 📧 <a href="contact@opengpt-x.de">contact@opengpt-x.de</a>) that is not completely free from biases and hallucinations.
77
+
78
+ ## How to Get Started with the Model
79
+
80
+ ## Usage
81
+ The model requires transformers, sentencepiece, and the torch library.
82
+ After installation, here's an example of how to use the model:
83
+
84
+ As this model is a fine-tuned model, it must be used with the provided prompt template. Using the model without the prompt template is not intended and is not recommended. The prompt template is defined as follows:
85
+ ```python
86
+ user="Hi!"
87
+ lang_code = "DE"
88
+ system_messages={
89
+ "EN": "A chat between a human and an artificial intelligence assistant."
90
+ " The assistant gives helpful and polite answers to the human's questions.",
91
+ "DE": "Ein Gespräch zwischen einem Menschen und einem Assistenten mit künstlicher Intelligenz."
92
+ " Der Assistent gibt hilfreiche und höfliche Antworten auf die Fragen des Menschen.",
93
+ }
94
+
95
+ prompt = f"System: {system_messages[lang_code]}\nUser: {user}\nAssistant:"
96
+ ```
97
+
98
+ The prompt template is also directly integrated in the Tokenizer and can be used as follows:
99
+ ```python
100
+ import torch
101
+ from transformers import AutoModelForCausalLM, AutoTokenizer
102
+ device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
103
+ model_name = "openGPT-X/Teuken-7B-instruct-commercial-v0.4"
104
+ model = AutoModelForCausalLM.from_pretrained(
105
+ model_name,
106
+ trust_remote_code=True,
107
+ torch_dtype=torch.bfloat16,
108
+ )
109
+ model = model.to(device).eval()
110
+ tokenizer = AutoTokenizer.from_pretrained(
111
+ model_name,
112
+ use_fast=False,
113
+ trust_remote_code=True,
114
+ )
115
+ messages = [{"role": "User", "content": "Wer bist du?"}]
116
+ prompt_ids = tokenizer.apply_chat_template(messages, chat_template="DE", tokenize=True, add_generation_prompt=True, return_tensors="pt")
117
+ prediction = model.generate(
118
+ prompt_ids.to(model.device),
119
+ max_length=512,
120
+ do_sample=True,
121
+ top_k=50,
122
+ top_p=0.95,
123
+ temperature=0.7,
124
+ num_return_sequences=1,
125
+ )
126
+ prediction_text = tokenizer.decode(prediction[0].tolist())
127
+ print(prediction_text)
128
+ ```
129
+
130
+ This example demonstrates how to load the model and tokenizer, prepare input, generate text, and print the result.
131
+
132
+ ### Usage with vLLM Server
133
+ Starting the vLLM Server:
134
+ ``` shell
135
+ vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-code
136
+ ```
137
+ Use Chat API with vLLM and pass the language of the Chat-Template as extra body:
138
+ ``` python
139
+ from openai import OpenAI
140
+
141
+ client = OpenAI(
142
+ api_key="EMPTY",
143
+ base_url="http://localhost:8000/v1",
144
+ )
145
+ completion = client.chat.completions.create(
146
+ model="openGPT-X/Teuken-7B-instruct-commercial-v0.4",
147
+ messages=[{"role": "User", "content": "Hallo"}],
148
+ extra_body={"chat_template":"DE"}
149
+ )
150
+ print(f"Assistant: {completion]")
151
+ ```
152
+ The default language of the Chat-Template can also be set when starting the vLLM Server. For this create a new file with the name `lang` and the content `DE` and start the vLLM Server as follows:
153
+ ``` shell
154
+ vllm serve openGPT-X/Teuken-7B-instruct-commercial-v0.4 --trust-remote-code --chat-template lang
155
+ ```
156
+
157
+ ### Usage with vLLM Offline Batched Inference
158
+ ``` python
159
+ from vllm import LLM, SamplingParams
160
+
161
+ sampling_params = SamplingParams(temperature=0.01, max_tokens=1024, stop=["</s>"])
162
+ llm = LLM(model="openGPT-X/Teuken-7B-instruct-commercial-v0.4", trust_remote_code=True, dtype="bfloat16")
163
+ outputs = llm.chat(
164
+ messages=[{"role": "User", "content": "Hallo"}],
165
+ sampling_params=sampling_params,
166
+ chat_template="DE"
167
+ )
168
+ print(f"Prompt: {outputs[0].prompt}")
169
+ print(f"Assistant: {outputs[0].outputs[0].text}")
170
+ ```
171
+
172
+ ## Training Details
173
+
174
+ ### Pre-Training Data
175
+
176
+ <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
177
+
178
+ Teuken-7B-base-v0.4 was pre-trained on 4 trillion tokens of data from publicly available sources.
179
+ The pretraining data has a cutoff of September 2023.
180
+ More information is available in our preprint ["Data Processing for the OpenGPT-X Model Family"](http://arxiv.org/abs/2410.08800).
181
+
182
+
183
+ ### Instruction-Tuning Data
184
+ The model was fine-tuned on a collection of English- and German-focused instruction-tuning datasets which also contains instructions for 22 official European languages
185
+ The dataset composition contains three types of data: multilingual data, English data, and translated German data
186
+
187
+ #### English data
188
+ * We only included a subsample of the OpenOrca dataset.
189
+ * To select instruction-tuning examples based on their quality, We calculated the reward scores of all English examples utilizing [Starling-RM-7B-alpha](https://huggingface.co/berkeley-nest/Starling-RM-7B-alpha) (Apache-2.0 license)
190
+
191
+ We aim to include roughly the same amount of English examples as we have multilingual examples:
192
+ 1. Add all multi-turn examples
193
+ 2. Add entire `code_alpaca` dataset subset
194
+ 4. For the remaining dataset subsets (`open_orca`, `evol_instruct_143k`, `evol_instruct_70k`, `sharegpt_v3`, `ultrachat_200k`), we add the samples with the highest reward scores so that each dataset subset contributes an equal amount of high-quality examples
195
+
196
+ ##### German Data
197
+ As we aim for a German- and English-centric, European language dataset and due to the sparsity of large-scale German instruction-tuning data, we translated the English portion of the above-described dataset composition. For this, we applied the [Alma-13B](https://huggingface.co/haoranxu/ALMA-13B) (MIT license) model. As code can be a problematic case for translation, we implemented a regex-based code detection functionality. With it, we exclude code snippets from translation and insert the code snippets after translation again.
198
+ As the `alpaca_code` contains many code snippets not detectable by our regex-based code detection implementation, we included this part of the dataset from the translation.
199
+
200
+ #### Multilingual data
201
+ For multilingual data we include the 14 offical European languages contained in the [aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) and the 21 offical European languages contained in the `translated_flan_cot` dataset of the [aya_collection](https://huggingface.co/datasets/CohereForAI/aya_collection/viewer/translated_flan_cot).
202
+
203
+ #### Datasets and Licenses
204
+
205
+ | Name | Language | License |
206
+ | :--------------------------------------------------------------------------------------------------------------------- | :----------- | :---------------------------------------------------------------------------------------------------------------- |
207
+ | [Open-Orca/OpenOrca](https://huggingface.co/datasets/Open-Orca/OpenOrca) | EN | MIT |
208
+ | [sahil2801/CodeAlpaca-20k](https://huggingface.co/datasets/sahil2801/CodeAlpaca-20k) | EN | CC-BY-4.0 |
209
+ | [WizardLM/WizardLM_evol_instruct_V2_196k](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_V2_196k) | EN | MIT |
210
+ | [WizardLM/WizardLM_evol_instruct_70k](https://huggingface.co/datasets/WizardLM/WizardLM_evol_instruct_70k) | EN | MIT |
211
+ | [anon8231489123/ShareGPT_Vicuna_unfiltered](https://huggingface.co/datasets/anon8231489123/ShareGPT_Vicuna_unfiltered) | EN | Apache-2.0 |
212
+ | [HuggingFaceH4/ultrachat_200k](https://huggingface.co/datasets/HuggingFaceH4/ultrachat_200k) | EN | MIT |
213
+ | [CohereForAI/aya_dataset](https://huggingface.co/datasets/CohereForAI/aya_dataset) | Multilingual | Apache-2.0 |
214
+ | [CohereForAI/aya_collection](https://huggingface.co/datasets/CohereForAI/aya_collection) | Multilingual | Apache-2.0 |
215
+ | [FreedomIntelligence/sharegpt-deutsch](https://huggingface.co/datasets/FreedomIntelligence/sharegpt-deutsch) | DE | Apache-2.0 |
216
+ | [bjoernp/ultrachat_de](https://huggingface.co/datasets/bjoernp/ultrachat_de) | DE | MIT |
217
+
218
+
219
+
220
+ Dataset contribution per language:
221
+
222
+ | | total | de_freedomintelligence_sharegpt | de_ultrachat_de | translated_flan_cot | aya_dataset | ultrachat_200k_translated_to_de | sharegpt_v3_unfiltered_translated_to_de | evol_instruct_143k_translated_to_de | evol_instruct_70k_translated_to_de | open_orca_translated_to_de | ultrachat_200k | sharegpt_v3_unfiltered | code_alpaca | open_orca | evol_instruct_143k | evol_instruct_70k |
223
+ |:---|--------:|----------------------------------:|------------------:|----------------------:|--------------:|----------------------------------:|------------------------------------------:|--------------------------------------:|-------------------------------------:|-----------------------------:|-----------------:|-------------------------:|--------------:|------------:|---------------------:|--------------------:|
224
+ | BG | 1909 | 0 | 0 | 1909 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
225
+ | CS | 1885 | 0 | 0 | 1885 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
226
+ | DA | 2001 | 0 | 0 | 1906 | 95 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
227
+ | DE | 77628 | 5818 | 898 | 1896 | 231 | 6940 | 37555 | 8116 | 8065 | 8109 | 0 | 0 | 0 | 0 | 0 | 0 |
228
+ | ET | 1901 | 0 | 0 | 1901 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
229
+ | EL | 2472 | 0 | 0 | 1881 | 591 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
230
+ | ES | 3800 | 0 | 0 | 1898 | 1902 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
231
+ | EN | 80806 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 6915 | 37600 | 12013 | 8074 | 8099 | 8105 |
232
+ | FI | 2598 | 0 | 0 | 1890 | 708 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
233
+ | FR | 3250 | 0 | 0 | 1890 | 1360 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
234
+ | HU | 1985 | 0 | 0 | 1892 | 93 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
235
+ | MT | 1918 | 0 | 0 | 1918 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
236
+ | IT | 2613 | 0 | 0 | 1910 | 703 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
237
+ | LT | 2800 | 0 | 0 | 1920 | 880 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
238
+ | NL | 3549 | 0 | 0 | 1905 | 1644 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
239
+ | PL | 3322 | 0 | 0 | 1909 | 1413 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
240
+ | PT | 3806 | 0 | 0 | 1897 | 1909 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
241
+ | RO | 1888 | 0 | 0 | 1888 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
242
+ | GA | 3069 | 0 | 0 | 1880 | 1189 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
243
+ | SK | 1922 | 0 | 0 | 1922 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
244
+ | SL | 1894 | 0 | 0 | 1894 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
245
+ | SV | 3160 | 0 | 0 | 1916 | 1244 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
246
+
247
+ Total across languages 210,176
248
+
249
+
250
+ ### Training Procedure
251
+
252
+ <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
253
+ Instruction fined tuned version of [Teuken-7B-base-v0.4](https://huggingface.co/openGPT-X/Teuken-7B-base-v0.4).
254
+ More information regarding the pre-training are available in our model preprint ["Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs"](https://arxiv.org/abs/2410.03730).
255
+
256
+ #### Training Hyperparameters
257
+
258
+ - **Training regime:** bf16 mixed precision <!--fp32, fp16 mixed precision, , bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
259
+
260
+ ## Evaluation
261
+
262
+ <!-- This section describes the evaluation protocols and provides the results. -->
263
+
264
+ Results on multilingual benchmarks for 21 European languages with instruction-tuned models
265
+ | Model | Avg. | EU21-ARC | EU21-HeSw | EU21-TQA | EU21-MMLU |
266
+ |--------------------------------|--------|----------|-----------|----------|-----------|
267
+ | Meta-Llama-3.1-8B-Instruct | **.563** | .563 | .579 | .532 | **.576** |
268
+ | Mistral-7B-Instruct-v0.3 | .527 | .530 | .538 | **.548** | .491 |
269
+ | Salamandra-7B-Instruct | .543 | **.595** | **.637** | .482 | .459 |
270
+ | Aya-23-8B | .485 | .475 | .535 | .476 | .455 |
271
+ | Occiglot-7B-eu5-Instruct | .475 | .484 | .519 | .471 | .428 |
272
+ | Pharia-1-LLM-7B-C-A | .417 | .396 | .438 | .469 | .366 |
273
+ | Bloomz-7B1 | .358 | .316 | .354 | .461 | .302 |
274
+ | **Teuken-7B-instruct-commercial-v0.4** | .531 | .569 | .620 | .503 | .430 |
275
+
276
+ More information regarding the quality of our translated benchmarks are available in our Evaluation preprint ["Towards Multilingual LLM Evaluation for European Languages"](https://arxiv.org/abs/2410.08928).
277
+ More evaluation results regarding Teuken-7B-instruct-research-v0.4 are available in our model preprint ["Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs"](https://arxiv.org/abs/2410.03730).
278
+
279
+ The model was evaluated in 21 languages on ARC, GSM8K, HellaSwag, TruthfulQA, Translation and MMLU. Results can also be seen in the [European LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard).
280
+
281
+ ## Technical Specifications
282
+
283
+ ### Model Architecture and Objective
284
+
285
+ | Hyper-Parameter | Value |
286
+ |----------------------------|----------|
287
+ | Training Objective | CLM |
288
+ | Activation Function | SwiGLU |
289
+ | Seq Length | 4096 |
290
+ | Position Embeddings | Rotary |
291
+ | Num Layers | 32 |
292
+ | Hidden Size | 4096 |
293
+ | FFN Hidden Size | 13440 |
294
+ | Num Attention Heads | 32 |
295
+ | Head Dim | 128 |
296
+ | Group Query Attention | yes |
297
+ | Num Query Groups | 2 |
298
+ | Normalization | RMSNorm |
299
+ | Learning rate | 3e-4 |
300
+ | Min learning rate | 3e-5 |
301
+ | Disable bias in linear | yes |
302
+ | Hidden dropout | 0.0 |
303
+ | Attention dropout | 0.0 |
304
+ | Optimizer | AdamW |
305
+ | Beta1 | 0.9 |
306
+ | Beta2 | 0.95 |
307
+ | Data-type | bf16 |
308
+ | Recompute-activations | yes |
309
+ | Distributed-optimizers | yes |
310
+
311
+ ### Compute Infrastructure
312
+
313
+ We trained our models on JUWELS Booster which consists of 936 compute nodes, each equipped with 4 NVIDIA A100 GPUs. The GPUs are hosted by AMD EPYC Rome CPUs. The compute nodes are connected with HDR-200 InfiniBand in a DragonFly+ topology.
314
+
315
+ #### Hardware
316
+
317
+ The configuration of JUWELS Booster compute nodes is the following:
318
+
319
+ CPU: AMD EPYC 7402 processor; 2 sockets, 24 cores per socket, SMT-2 (total: 2×24×2 = 96 threads) in NPS-4 1 configuration
320
+ Memory: 512 GB DDR4-3200 RAM (of which at least 20 GB is taken by the system software stack, including the file system); 256 GB per socket; 8 memory channels per socket (2 channels per NUMA domain)
321
+ GPU: 4 × NVIDIA A100 Tensor Core GPU with 40 GB; connected via NVLink3 to each other
322
+ Network: 4 × Mellanox HDR200 InfiniBand ConnectX 6 (200 Gbit/s each), HCA
323
+ Periphery: CPU, GPU, and network adapter are connected via 2 PCIe Gen 4 switches with 16 PCIe lanes going to each device (CPU socket: 2×16 lanes). PCIe switches are configured in synthetic mode.
324
+ #### Software
325
+
326
+ [Megatron-LM](https://github.com/OpenGPTX/Megatron-LM)
327
+
328
+ **BibTeX:**
329
+
330
+ If you find our model useful in your research, please consider citing our [preprint](https://arxiv.org/abs/2410.03730):
331
+ ```
332
+ @misc{ali2024teuken7bbaseteuken7binstructeuropean,
333
+ title={Teuken-7B-Base & Teuken-7B-Instruct: Towards European LLMs},
334
+ author={Mehdi Ali and Michael Fromm and Klaudia Thellmann and Jan Ebert and Alexander Arno Weber and Richard Rutmann and Charvi Jain and Max Lübbering and Daniel Steinigen and Johannes Leveling and Katrin Klug and Jasper Schulze Buschhoff and Lena Jurkschat and Hammam Abdelwahab and Benny Jörg Stein and Karl-Heinz Sylla and Pavel Denisov and Nicolo' Brandizzi and Qasid Saleem and Anirban Bhowmick and Lennard Helmer and Chelsea John and Pedro Ortiz Suarez and Malte Ostendorff and Alex Jude and Lalith Manjunath and Samuel Weinbach and Carolin Penke and Oleg Filatov and Shima Asaadi and Fabio Barth and Rafet Sifa and Fabian Küch and Andreas Herten and René Jäkel and Georg Rehm and Stefan Kesselheim and Joachim Köhler and Nicolas Flores-Herr},
335
+ year={2024},
336
+ eprint={2410.03730},
337
+ archivePrefix={arXiv},
338
+ primaryClass={cs.CL},
339
+ url={https://arxiv.org/abs/2410.03730},
340
+ }
341
+ ```
342
+
343
+ # Team
344
+ ## Data Team
345
+ Anirban Bhowmick (IAIS), Nicolo Brandizzi (IAIS), Lennard Helmer (IAIS), Benny Jörg Stein (IAIS), Karl-Heinz Sylla (IAIS), Pavel Denisov (IAIS), Qasid Saleem (IAIS), Johannes Leveling (IAIS), Hammam Abdelwahab (IAIS), Luzian Hahn (IIS), Farzad Naderi (IIS), Md Saiful Islam (IIS), Alexander Schwirjow (IIS), Pedro Ortiz Suarez (ex. DFKI), Malte Ostendorff (ex. DFKI)
346
+ ## Model-Training Team
347
+ ### Core contributors
348
+ Mehdi Ali (IAIS), Michael Fromm (IAIS), Jan Ebert (FZJ), Chelsea John (FZJ), Lena Jurkschat (TUD), Alexander Weber (IAIS)
349
+ ### Contributors:
350
+ Richard Rutmann (IAIS), Daniel Steinigen (IAIS), Lalith Manjunath (TUD), Carolin Penke (FZJ)
351
+ ## Evaluation Team
352
+ ### Core contributors
353
+ Klaudia Thellmann (TUD), Alex Jude (IAIS), Jasper Buschhoff (IAIS)
354
+ ### Contributors:
355
+ Shima Assadi (IIS), Fabio Barth (DFKI)
356
+ ## Management
357
+ Joachim Köhler (IAIS), Nicolas Flores-Herr (IAIS), Stefan Kesselheim (FZJ), Andreas Herten (FZJ), Georg Rehm (DFKI), René Jäkel (TUD), Fabian Küch (IIS), Nicole Hildebrandt (IAIS), Ines Wendler (IAIS)
358
+
359
+ We believe that collaboration is key to overcome the aforementioned limitations and thereby strengthening the European GenAI landscape. Because of this, the team invites researchers, developers, and AI enthusiasts to join and engage through various platforms. A Discord server has been created for community collaboration, offering a space for discussions on technical details, ideas, and direct interaction with developers. Additionally, resources like research publications and a European LLM Leaderboard provide insights into Teuken-7B’s performance and technical aspects. The OpenGPT-X team encourages ongoing engagement and collaboration as the project evolves.
360
+ Key links:
361
+ Discord: OpenGPT-X [Discord server](https://discord.com/invite/RvdHpGMvB3)
362
+ Research Papers: OpenGPT-X News [Research Papers](https://opengpt-x.de/en/news-en/)
363
+ LLM Leaderboard: European LLM Leaderboard [LLM Leaderboard](https://huggingface.co/spaces/openGPT-X/european-llm-leaderboard)
364
+
365
+ <div class="hf-card">
366
+ <h2>Contact Information</h2>
367
+ <p>You can reach out to the following model card contact:</p>
368
+ <ul>
369
+ <li>
370
+ <a href="https://huggingface.co/openGPT-X" target="_blank">OpenGPT-X</a>
371
+ - <a href="contact@opengpt-x.de">contact@opengpt-x.de</a>
372
+ </li>
373
+ </ul>
374
+ </div>