distilgpt2-tiny-conversational
This model is a fine-tuned version of distilgpt2 on a parsed version of Wizard of Wikipedia. Persona alpha/beta framework designed for use with ai-msgbot. It achieves the following results on the evaluation set:
- Loss: 2.2461
Model description
- a basic dialogue model for conversation. It can be used as a chatbot.
- check out a simple demo here
Intended uses & limitations
- usage is designed for integrating with this repo: ai-msgbot
- the main specific information to know is that the model generates whole conversations between two entities,
person alpha
andperson beta
. These entity names are used functionally as custom<bos>
tokens to extract when one response ends and another begins.
Training and evaluation data
- wizard of Wikipedia parsed, from parlAI
Training procedure
- deepspeed + huggingface trainer, an example notebook is in ai-msgbot
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 32
- eval_batch_size: 32
- seed: 42
- distributed_type: multi-GPU
- gradient_accumulation_steps: 4
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.05
- num_epochs: 30
Training results
Training Loss | Epoch | Step | Validation Loss |
---|---|---|---|
No log | 1.0 | 418 | 2.7793 |
2.9952 | 2.0 | 836 | 2.6914 |
2.7684 | 3.0 | 1254 | 2.6348 |
2.685 | 4.0 | 1672 | 2.5938 |
2.6243 | 5.0 | 2090 | 2.5625 |
2.5816 | 6.0 | 2508 | 2.5332 |
2.5816 | 7.0 | 2926 | 2.5098 |
2.545 | 8.0 | 3344 | 2.4902 |
2.5083 | 9.0 | 3762 | 2.4707 |
2.4793 | 10.0 | 4180 | 2.4551 |
2.4531 | 11.0 | 4598 | 2.4395 |
2.4269 | 12.0 | 5016 | 2.4238 |
2.4269 | 13.0 | 5434 | 2.4102 |
2.4051 | 14.0 | 5852 | 2.3945 |
2.3777 | 15.0 | 6270 | 2.3848 |
2.3603 | 16.0 | 6688 | 2.3711 |
2.3394 | 17.0 | 7106 | 2.3613 |
2.3206 | 18.0 | 7524 | 2.3516 |
2.3206 | 19.0 | 7942 | 2.3398 |
2.3026 | 20.0 | 8360 | 2.3301 |
2.2823 | 21.0 | 8778 | 2.3203 |
2.2669 | 22.0 | 9196 | 2.3105 |
2.2493 | 23.0 | 9614 | 2.3027 |
2.2334 | 24.0 | 10032 | 2.2930 |
2.2334 | 25.0 | 10450 | 2.2852 |
2.2194 | 26.0 | 10868 | 2.2754 |
2.2014 | 27.0 | 11286 | 2.2695 |
2.1868 | 28.0 | 11704 | 2.2598 |
2.171 | 29.0 | 12122 | 2.2539 |
2.1597 | 30.0 | 12540 | 2.2461 |
Framework versions
- Transformers 4.16.1
- Pytorch 1.10.0+cu111
- Tokenizers 0.11.0
- Downloads last month
- 183
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.