|
--- |
|
license: mit |
|
title: 'MinimalGPT: Felis Catus' |
|
sdk: gradio |
|
emoji: 😻 |
|
colorFrom: gray |
|
colorTo: blue |
|
pinned: true |
|
--- |
|
|
|
# MinimalGPT: Felis Catus |
|
|
|
[[`MinimalGPT`](https://github.com/abhaskumarsinha/MinimalGPT)] [[`Project Gutenberg Dataset`](https://www.kaggle.com/datasets/shubchat/1002-short-stories-from-project-guttenberg)] |
|
|
|
|
|
This HuggingFace space serves as an illustrative application of the GitHub Repository: [MinimalGPT](https://github.com/abhaskumarsinha/MinimalGPT), which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size. |
|
|
|
Within this HF space, we introduce a diminutive GPT model named [Felis Catus](https://en.wikipedia.org/wiki/Cat) (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text. |
|
|
|
At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project. |
|
|
|
## Model Specifications |
|
|
|
``` |
|
Model: "model" |
|
_________________________________________________________________ |
|
Layer (type) Output Shape Param # |
|
================================================================= |
|
input_1 (InputLayer) [(None, 10)] 0 |
|
|
|
embedding (Embedding) (None, 10, 128) 1597184 |
|
|
|
positional_embedding (Posit (None, 10, 128) 0 |
|
ionalEmbedding) |
|
|
|
decoder (Decoder) (None, 10, 128) 71208 |
|
|
|
flatten (Flatten) (None, 1280) 0 |
|
|
|
dense (Dense) (None, 12479) 15985599 |
|
|
|
tf.nn.softmax (TFOpLambda) (None, 12479) 0 |
|
|
|
================================================================= |
|
Total params: 17,653,991 |
|
Trainable params: 17,653,991 |
|
Non-trainable params: 0 |
|
_________________________________________________________________ |
|
``` |
|
|
|
## Hyperparameters |
|
|
|
``` |
|
gpt_input: 10 [Max input size, d_k] |
|
d_model: 128 [Embedding size, d_model] |
|
h: 8 [Number of multiheads, h] |
|
decoder_stacks: 1 [Number of decoder stacks, stack] |
|
GPT_attention: True [Attention Layer implementation type - OpenAI style] |
|
``` |
|
|
|
## References |
|
1. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017). |
|
2. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9. |
|
3. Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org. |
|
4. Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015). |