Text Generation
Transformers
Safetensors
English
llama
Llama-3-6B
6B
text-generation-inference
Inference Endpoints
prince-canuma commited on
Commit
7c2846e
1 Parent(s): 82d3500

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -14
README.md CHANGED
@@ -10,31 +10,23 @@ datasets:
10
  # Model Summary
11
  <img src="llama-3-6B icon.jpeg" width="500" alt="Llama-3-6B"/>
12
 
13
- This is world's first Llama-3 base model with 6B params, it is a pretrained version of [prince-canuma/Llama-3-6B-v0](https://huggingface.co/prince-canuma/Llama-3-6B-v0) which was, downcycled from Meta-Llama-3-8B.
14
- It was continually pretrained on 1B tokens of enlish only text from fineweb and achieves the following results on the evaluation set:
15
  - Loss: 2.4942
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
19
- This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
20
-
21
- - **Developed by:** [More Information Needed]
22
- - **Funded by [optional]:** [More Information Needed]
23
- - **Shared by [optional]:** [More Information Needed]
24
- - **Model type:** [More Information Needed]
25
- - **Language(s) (NLP):** [More Information Needed]
26
- - **License:** [More Information Needed]
27
- - **Finetuned from model [optional]:** [More Information Needed]
28
-
29
  ## Model Description
30
 
31
  <!-- Provide a longer summary of what this model is. -->
32
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
33
 
34
  - **Developed by:** [Prince Canuma](https://huggingface.co/prince-canuma)
35
- - **Model type:** Transformer
 
 
36
  - **License:** MIT
37
- - **Finetuned from model:** prince-canuma/Llama-3-6B-v0
38
 
39
  ### Model Sources [optional]
40
 
@@ -89,6 +81,15 @@ Python 2 and Python 3 are two different versions of the Python language. Python
89
 
90
  ## Training Details
91
 
 
 
 
 
 
 
 
 
 
92
  ### Training Data
93
 
94
  For continued pretrained, I extracted 1B tokens from [Huggingface's FineWeb CC-Main-2024-10](https://huggingface.co/datasets/HuggingFaceFW/fineweb#breakdown-by-dumpcrawl) slice.
 
10
  # Model Summary
11
  <img src="llama-3-6B icon.jpeg" width="500" alt="Llama-3-6B"/>
12
 
13
+ Introducing the world's first Llama-3 base model with 6B parameters. This model is a pretrained version of [prince-canuma/Llama-3-6B-v0](https://huggingface.co/prince-canuma/Llama-3-6B-v0), which was created from Meta-Llama-3-8B using a technique called [downcycling](https://youtube.com/playlist?list=PLDn_JsyofyfTH5_5V1MNb8UYKxMl6IMNy&si=9hcOol4KHIgWThgt) .
14
+ The model was continually pretrained on 1 billion tokens of English-only text from fineweb, achieving impressive results on the evaluation set:
15
  - Loss: 2.4942
16
 
17
  <!-- Provide a longer summary of what this model is. -->
18
 
 
 
 
 
 
 
 
 
 
 
19
  ## Model Description
20
 
21
  <!-- Provide a longer summary of what this model is. -->
22
  This is the model card of a 🤗 transformers model that has been pushed on the Hub. This model card has been automatically generated.
23
 
24
  - **Developed by:** [Prince Canuma](https://huggingface.co/prince-canuma)
25
+ - **Sponsored by:** General
26
+ - **Model type:** Llama
27
+ - **Language(s) (NLP):** [More Information Needed]
28
  - **License:** MIT
29
+ - **Pretrained from model:** prince-canuma/Llama-3-6B-v0
30
 
31
  ### Model Sources [optional]
32
 
 
81
 
82
  ## Training Details
83
 
84
+ ### Downcycling
85
+
86
+ A technique that allows you to create new LLMs of diversa sizes from checkpoints of large pretrained models.
87
+ You take a reference model (i.e., Llama-3-8B) and copy the weights of 24 layers out of 32 layers alongside embedding and prediction heads. Then you initialize a smaller target model with 24 layers and load those pretrained weights.
88
+ This new model will most likely still output legible outputs, but for it to perform well you need continue the pretraining.
89
+
90
+
91
+
92
+
93
  ### Training Data
94
 
95
  For continued pretrained, I extracted 1B tokens from [Huggingface's FineWeb CC-Main-2024-10](https://huggingface.co/datasets/HuggingFaceFW/fineweb#breakdown-by-dumpcrawl) slice.