prince-canuma
/

Llama-3-6B-v0

Text Generation

text-generation-inference

Model card Files Files and versions Community

Llama-3-6B-v0 / README.md

prince-canuma's picture

Update README.md

ed9f929 verified 10 months ago

|

history blame contribute delete

2.79 kB

	---
	language:
	- en
	license: llama3
	library_name: transformers
	tags:
	- Llama-3-6B
	- 6B
	---

	# Model Summary
	<img src="images/llama-3-6B icon.jpeg" width="500" alt="Llama-3-6B"/>

	Introducing the world's first Llama-3 base model with 6B parameters. This model is a untrained model which was created from Meta-Llama-3-8B using a technique called [downcycling](https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=9hcOol4KHIgWThgt) .

	You can check trained version of this model here:
	https://huggingface.co/prince-canuma/Llama-3-6B-v0.1

	<!-- Provide a longer summary of what this model is. -->

	## Model Description

	<!-- Provide a longer summary of what this model is. -->
	- Developed by: [Prince Canuma](https://huggingface.co/prince-canuma)
	- Sponsored by: General
	- Model type: Llama
	- License: [Llama-3](https://llama.meta.com/llama3/license)

	### Model Sources

	<!-- Provide the basic links for the model. -->

	- Repository: https://github.com/Blaizzy/Coding-LLMs-from-scratch/tree/main/Llama-3
	- Video: https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=5Y4cm-6wrMOD1Abr


	## Citation

	<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->

	### BibTeX:

	```bibtex
	@misc{prince2024downcycling,
	title={Efficient LLM Downcycling: Generating Diverse Model Sizes from Pretrained Giants},
	author={Prince Canuma},
	year={2024},
	}
	```

	# Thank You!

	I want to extend my heartfelt thanks to the community for the invaluable expertise and unwavering support.

	Additionally, I would like to thank Viet from General Catalyst (GC) for providing me with the much needed compute.

	This is my most ambitious project yet, and it wouldn't have been possible without the incredible open-source ML community!

	Developers, I am eager to see and hear about the innovative fine-tunes and applications you create.

	Users, I am excited to learn about your experiences and use cases.

	Thank you for your interest and support!

	## References:

	```bibtex
	@misc{komatsuzaki2023sparse,
	title={Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints},
	author={Aran Komatsuzaki and Joan Puigcerver and James Lee-Thorp and Carlos Riquelme Ruiz and Basil Mustafa and Joshua Ainslie and Yi Tay and Mostafa Dehghani and Neil Houlsby},
	year={2023},
	eprint={2212.05055},
	archivePrefix={arXiv},
	primaryClass={cs.LG}
	}
	```

	```bibtex
	@misc{sanyal2024pretraining,
	title={Pre-training Small Base LMs with Fewer Tokens},
	author={Sunny Sanyal and Sujay Sanghavi and Alexandros G. Dimakis},
	year={2024},
	eprint={2404.08634},
	archivePrefix={arXiv},
	primaryClass={cs.CL}
	}
	```