--- license: mit widget: - text: বহুল আলোচিত দশম জাতীয় সংসদ - text: গাজীপুরের কালিয়াকৈর উপজেলার তেলিরচালা --- Bangla GPT2 model was trained using the Bangla Newspaper dataset. Here we used prothom alo 250mb data for GPT2 model training and also vocab size 50k. Github link : https://github.com/saiful9379/Bangla_GPT2 ```py from transformers import TFGPT2LMHeadModel, GPT2Tokenizer tokenizer = GPT2Tokenizer.from_pretrained("saiful9379/Bangla_GPT2") model = TFGPT2LMHeadModel.from_pretrained("saiful9379/Bangla_GPT2") text = "বহুল আলোচিত দশম জাতীয় সংসদ" input_ids = tokenizer.encode(text, return_tensors='tf') print(input_ids) output = model.generate( input_ids, max_length=175, num_beams=10, temperature=0.7, no_repeat_ngram_size=2, num_return_sequences=5 ) predicted_text = tokenizer.decode(output[0], skip_special_tokens=True) print(predicted_text) ``` Here is the basic configuration of Bangla GPT2 Model, ``` vocab_size = 50000 block_size = 200 learning_rate=3e-5 num_epoch = 100 batch_size = 12 buffer_size = 1000 ```