File size: 3,750 Bytes
f36f46c 31a038a f36f46c 31a038a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 |
---
license: mit
title: 'MinimalGPT: Felis Catus'
sdk: gradio
emoji: 😻
colorFrom: gray
colorTo: blue
pinned: true
---
# MinimalGPT: Felis Catus
[[`MinimalGPT`](https://github.com/abhaskumarsinha/MinimalGPT)] [[`Project Gutenberg Dataset`](https://www.kaggle.com/datasets/shubchat/1002-short-stories-from-project-guttenberg)]
This HuggingFace space serves as an illustrative application of the GitHub Repository: [MinimalGPT](https://github.com/abhaskumarsinha/MinimalGPT), which embodies a departure from conventional GPT models that undergo scaling and training on high-performance computing systems and clusters. The primary objective of the MinimalGPT project was to explore the extent to which a GPT model could be minimized in size.
Within this HF space, we introduce a diminutive GPT model named [Felis Catus](https://en.wikipedia.org/wiki/Cat) (stray Cat), which boasts a mere 15 million parameters. What distinguishes this model is its training process, which was executed on a standard home computer CPU (specifically, an AMD Ryzen 5) without any reliance on GPU acceleration. Remarkably, the training duration lasted a mere 15 minutes, utilizing a dataset comprising a meager ~150,000 tokens of text.
At present, the Felis Catus model exhibits the capacity to generate a concise story excerpt consisting of 70 tokens, requiring a mere 5 to 7 words as input. The model's dictionary encompasses a modest 12,000 words. Moreover, we are presently engaged in endeavors to further scale the model in our forthcoming project.
## Model Specifications
```
Model: "model"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) [(None, 10)] 0
embedding (Embedding) (None, 10, 128) 1597184
positional_embedding (Posit (None, 10, 128) 0
ionalEmbedding)
decoder (Decoder) (None, 10, 128) 71208
flatten (Flatten) (None, 1280) 0
dense (Dense) (None, 12479) 15985599
tf.nn.softmax (TFOpLambda) (None, 12479) 0
=================================================================
Total params: 17,653,991
Trainable params: 17,653,991
Non-trainable params: 0
_________________________________________________________________
```
## Hyperparameters
```
gpt_input: 10 [Max input size, d_k]
d_model: 128 [Embedding size, d_model]
h: 8 [Number of multiheads, h]
decoder_stacks: 1 [Number of decoder stacks, stack]
GPT_attention: True [Attention Layer implementation type - OpenAI style]
```
## References
1. Vaswani, Ashish, et al. "Attention is all you need." Advances in neural information processing systems 30 (2017).
2. Radford, Alec, et al. "Language models are unsupervised multitask learners." OpenAI blog 1.8 (2019): 9.
3. Project Gutenberg. (n.d.). Retrieved FebruApril 20, 2023, from www.gutenberg.org.
4. Abadi, Martın, et al. "TensorFlow: Large-scale machine learning on heterogeneous systems, software available from tensorflow. org (2015)." URL https://www.tensorflow.org (2015). |