BEE-spoke-data
/

smol_llama-101M-GQA

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

pszemraj commited on Dec 6, 2023

Commit

9c9c090

·

1 Parent(s): c9ed0ac

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -71,6 +71,14 @@ A small 101M param (total) decoder model. This is the first version of the model
 - GQA (24 heads, 8 key-value), context length 1024
 - train-from-scratch
 ## Notes
 **This checkpoint** is the 'raw' pre-trained model and has not been tuned to a more specific task. **It should be fine-tuned** before use in most cases.

 - GQA (24 heads, 8 key-value), context length 1024
 - train-from-scratch
+## Features
+Some cool anecdotes about this model:
+- this model was pretrained on **one GPU** for 5 compute-days. You can DIY pretrain too!
+- 0% of the training data (to our knowledge) comes from OpenAI synthetic generation
 ## Notes
 **This checkpoint** is the 'raw' pre-trained model and has not been tuned to a more specific task. **It should be fine-tuned** before use in most cases.