redrussianarmy
/

gpt2-turkish-cased

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

redrussianarmy commited on Nov 16, 2020

Commit

ccf68f4

·

1 Parent(s): 76f7408

Update README.md

Files changed (1) hide show

README.md +35 -2

README.md CHANGED Viewed

@@ -1,2 +1,35 @@
-In this repository we release (yet another) GPT-2 model, that was trained on various texts for German.
-In this repository, I release pretrained GPT-2 model, that was trained on various texts for Turkish

+# Turkish GPT-2 Model
+In this repository I release GPT-2 model, that was trained on various texts for Turkish.
+The model is meant to be an entry point for fine-tuning on other texts.
+# Training corpora
+I used a Turkish corpora that is taken from oscar-corpus.
+It was possible to create byte-level BPE with Tokenizers library of Huggingface.
+With the Tokenizers library, I created a 52K byte-level BPE vocab based on the training corpora.
+After creating the vocab, I could train the GPT-2 for Turkish on two 2080TI over the complete training corpus (five epochs).
+# Using the model
+The model itself can be used in this way:
+``` python
+from transformers import AutoTokenizer, AutoModelWithLMHead
+tokenizer = AutoTokenizer.from_pretrained("redrussianarmy/gpt2-turkish-cased")
+model = AutoModelWithLMHead.from_pretrained("redrussianarmy/gpt2-turkish-cased")
+```
+Here's an example that shows how to use the great Transformers Pipelines for generating text:
+``` python
+from transformers import pipeline
+pipe = pipeline('text-generation', model="redrussianarmy/gpt2-turkish-cased",
+                 tokenizer="redrussianarmy/gpt2-turkish-cased", config={'max_length':800})
+text = pipe("Akşamüstü yolda ilerlerken, ")[0]["generated_text"]
+print(text)
+```