--- license: mit language: de pipeline_tag: text-generation widget: - text: "In einer schockierenden Entdeckung fanden Wissenschaftler eine Herde Einhörner, die in " example_title: "Einhörner ..." - text: |- Definiere folgende Wörter Wort: Einhorn Definition: Das Einhorn ist ein Fabelwesen von Pferde- oder Ziegengestalt mit einem geraden Horn auf der Stirnmitte. Wort: Regierungschef Definition: Der Regierungschef ist der Leiter der Regierung eines Staates (z. B. National- oder Gliedstaat). Wort: Waffendrill Definition: example_title: "Definiere ..." --- # German GPT2-XL (1.5B) - trained with [BigScience's DeepSpeed-Megatron-LM code base](https://github.com/bigscience-workshop/Megatron-DeepSpeed) - word embedding initialized with [WECHSEL](https://arxiv.org/abs/2112.06598) and all other weights taken from English [gpt2-xl](https://huggingface.co/gpt2-xl) - ~ 3 days on 16xA100 GPUs (~ 80 TFLOPs / GPU) - stopped after 100k steps - 26.2B tokens - less than a single epoch on `oscar_unshuffled_deduplicated_de` (excluding validation set; original model was trained for 75 epochs on less data) - bf16 - zero stage 0 - tp/pp = 1 ### How to use You can use this model directly with a pipeline for text generation. Since the generation relies on some randomness, we set a seed for reproducibility: ```python >>> from transformers import pipeline, set_seed >>> generator = pipeline('text-generation', model='malteos/gpt2-xl-wechsel-german') >>> set_seed(42) >>> generator("Hello, I'm a language model,", max_length=30, num_return_sequences=5) [{'generated_text': "Hello, I'm a language model, a language for thinking, a language for expressing thoughts."}, {'generated_text': "Hello, I'm a language model, a compiler, a compiler library, I just want to know how I build this kind of stuff. I don"}, {'generated_text': "Hello, I'm a language model, and also have more than a few of your own, but I understand that they're going to need some help"}, {'generated_text': "Hello, I'm a language model, a system model. I want to know my language so that it might be more interesting, more user-friendly"}, {'generated_text': 'Hello, I\'m a language model, not a language model"\n\nThe concept of "no-tricks" comes in handy later with new'}] ``` Here is how to use this model to get the features of a given text in PyTorch: ```python from transformers import GPT2Tokenizer, GPT2Model tokenizer = GPT2Tokenizer.from_pretrained('malteos/gpt2-xl-wechsel-german') model = GPT2Model.from_pretrained('malteos/gpt2-xl-wechsel-german') text = "Replace me by any text you'd like." encoded_input = tokenizer(text, return_tensors='pt') output = model(**encoded_input) ``` ## Evaluation | Model (size) | PPL | |---|---| | `gpt2-xl-wechsel-german` (1.5B) | **14.5** | | `gpt2-wechsel-german-ds-meg` (117M) | 26.4 | | `gpt2-wechsel-german` (117M) | 26.8 | | `gpt2` (retrained from scratch) (117M) | 27.63 | ## License MIT