NebulaByte
/

hindi_gpt2

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Metrics Training metrics Community

NebulaByte commited on Jul 24, 2023

Commit

6cfc1c0

·

1 Parent(s): ac6e6d2

Update README.md

Files changed (1) hide show

README.md +59 -1

README.md CHANGED Viewed

@@ -10,4 +10,62 @@ widget:
 inference:
   parameters:
     max_length: 200
----

 inference:
   parameters:
     max_length: 200
+---
+# Model Overview:
+The model is a language generation model designed for extending the GPT2 models to support Hindi language along with the original languages that it supports. It was fine-tuned on Hindi texts of [wikipedia](https://www.kaggle.com/datasets/disisbig/hindi-wikipedia-articles-55k) articles.
+# Model Architecture and Parameters:
+The model architecture is based on the GPT-2 framework, specifically using the parameters of the small version of the original OpenAI GPT2 model. It employs a Byte Pair Encoding (BPE) tokenizer.
+# Corpus:
+The training corpus for Hindi GPT2 consists of Wikipedia articles.
+# Tokenizer:
+A tokenizer is trained on Hindi Wikipedia Corpus. The new tokenizer vocabulary (5000 tokens) is merged with existing tokenizer. Hindi GPT2 uses a byte-level version of Byte Pair Encoding (BPE) for tokenizing Hindi text, including Unicode characters. The tokenizer has a vocabulary size of 53497, which allows it to effectively represent the Hindi language's rich vocabulary. Input sequences are formed by breaking the text into consecutive tokens with a maximum length of 1024 tokens.
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+More information needed
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 0.0005
+- train_batch_size: 64
+- eval_batch_size: 64
+- seed: 42
+- gradient_accumulation_steps: 4
+- total_train_batch_size: 256
+- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_steps: 500
+- num_epochs: 1
+- mixed_precision_training: Native AMP
+### Training results
+| Step | Training Loss | Validation Loss |
+| :---- | :------------- | :--------------- |
+| 500  | 2.0016        | 1.066703        |
+| 1000 | 1.0314        | 0.959653        |
+| 1500 | 0.9593        | 0.918827        |
+| 2000 | 0.922         | 0.889607        |
+| 2500 | 0.8983        | 0.872523        |
+| 3000 | 0.8852        | 0.863592        |
+### Framework versions
+- Transformers 4.30.2
+- torch 1.13.1
+- Datasets 2.13.1
+- Tokenizers 0.13.3