NebulaByte
/

hindi_gpt2

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

hindi_gpt2 / README.md

NebulaByte's picture

Update README.md

6cfc1c0 over 1 year ago

|

2.79 kB

	---
	license: apache-2.0
	widget:
	- text: "अपने अनुप्रयोग को पहुंचनीयता व्यायाम"
	- text: "जनतंत्र की सफलता केवल इस बात से नहीं हो सकती है कि हर"
	- text: "अगर इसके बाद भी वे फैसले पर कायम रहते हैं और"
	- text: "मामले का खुलासा होने के बाद"
	- text: "My name is Julien and I like to"
	- text: "My name is Thomas and my main"
	inference:
	parameters:
	max_length: 200
	---

	# Model Overview:
	The model is a language generation model designed for extending the GPT2 models to support Hindi language along with the original languages that it supports. It was fine-tuned on Hindi texts of [wikipedia](https://www.kaggle.com/datasets/disisbig/hindi-wikipedia-articles-55k) articles.

	# Model Architecture and Parameters:
	The model architecture is based on the GPT-2 framework, specifically using the parameters of the small version of the original OpenAI GPT2 model. It employs a Byte Pair Encoding (BPE) tokenizer.

	# Corpus:
	The training corpus for Hindi GPT2 consists of Wikipedia articles.

	# Tokenizer:
	A tokenizer is trained on Hindi Wikipedia Corpus. The new tokenizer vocabulary (5000 tokens) is merged with existing tokenizer. Hindi GPT2 uses a byte-level version of Byte Pair Encoding (BPE) for tokenizing Hindi text, including Unicode characters. The tokenizer has a vocabulary size of 53497, which allows it to effectively represent the Hindi language's rich vocabulary. Input sequences are formed by breaking the text into consecutive tokens with a maximum length of 1024 tokens.

	## Intended uses & limitations

	More information needed

	## Training and evaluation data

	More information needed

	## Training procedure

	More information needed

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 0.0005
	- train_batch_size: 64
	- eval_batch_size: 64
	- seed: 42
	- gradient_accumulation_steps: 4
	- total_train_batch_size: 256
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: cosine
	- lr_scheduler_warmup_steps: 500
	- num_epochs: 1
	- mixed_precision_training: Native AMP

	### Training results

	\| Step \| Training Loss \| Validation Loss \|
	\| :---- \| :------------- \| :--------------- \|
	\| 500 \| 2.0016 \| 1.066703 \|
	\| 1000 \| 1.0314 \| 0.959653 \|
	\| 1500 \| 0.9593 \| 0.918827 \|
	\| 2000 \| 0.922 \| 0.889607 \|
	\| 2500 \| 0.8983 \| 0.872523 \|
	\| 3000 \| 0.8852 \| 0.863592 \|


	### Framework versions

	- Transformers 4.30.2
	- torch 1.13.1
	- Datasets 2.13.1
	- Tokenizers 0.13.3