|
--- |
|
language: |
|
- en |
|
license: llama3 |
|
library_name: transformers |
|
tags: |
|
- Llama-3-6B |
|
- 6B |
|
--- |
|
|
|
# Model Summary |
|
<img src="images/llama-3-6B icon.jpeg" width="500" alt="Llama-3-6B"/> |
|
|
|
Introducing the world's first Llama-3 base model with 6B parameters. This model is a untrained model which was created from Meta-Llama-3-8B using a technique called [downcycling](https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=9hcOol4KHIgWThgt) . |
|
|
|
You can check trained version of this model here: |
|
https://huggingface.co/prince-canuma/Llama-3-6B-v0.1 |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
|
|
## Model Description |
|
|
|
<!-- Provide a longer summary of what this model is. --> |
|
- **Developed by:** [Prince Canuma](https://huggingface.co/prince-canuma) |
|
- **Sponsored by:** General |
|
- **Model type:** Llama |
|
- **License:** [Llama-3](https://llama.meta.com/llama3/license) |
|
|
|
### Model Sources |
|
|
|
<!-- Provide the basic links for the model. --> |
|
|
|
- **Repository:** https://github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3 |
|
- **Video:** https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=5Y4cm-6wrMOD1Abr |
|
|
|
|
|
## Citation |
|
|
|
<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> |
|
|
|
### **BibTeX:** |
|
|
|
```bibtex |
|
@misc{prince2024downcycling, |
|
title={Efficient LLM Downcycling: Generating Diverse Model Sizes from Pretrained Giants}, |
|
author={Prince Canuma}, |
|
year={2024}, |
|
} |
|
``` |
|
|
|
# **Thank You!** |
|
|
|
I want to extend my heartfelt thanks to the community for the invaluable expertise and unwavering support. |
|
|
|
Additionally, I would like to thank Viet from General Catalyst (GC) for providing me with the much needed compute. |
|
|
|
This is my most ambitious project yet, and it wouldn't have been possible without the incredible open-source ML community! |
|
|
|
Developers, I am eager to see and hear about the innovative fine-tunes and applications you create. |
|
|
|
Users, I am excited to learn about your experiences and use cases. |
|
|
|
Thank you for your interest and support! |
|
|
|
## References: |
|
|
|
```bibtex |
|
@misc{komatsuzaki2023sparse, |
|
title={Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints}, |
|
author={Aran Komatsuzaki and Joan Puigcerver and James Lee-Thorp and Carlos Riquelme Ruiz and Basil Mustafa and Joshua Ainslie and Yi Tay and Mostafa Dehghani and Neil Houlsby}, |
|
year={2023}, |
|
eprint={2212.05055}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.LG} |
|
} |
|
``` |
|
|
|
```bibtex |
|
@misc{sanyal2024pretraining, |
|
title={Pre-training Small Base LMs with Fewer Tokens}, |
|
author={Sunny Sanyal and Sujay Sanghavi and Alexandros G. Dimakis}, |
|
year={2024}, |
|
eprint={2404.08634}, |
|
archivePrefix={arXiv}, |
|
primaryClass={cs.CL} |
|
} |
|
``` |