parambharat commited on
Commit
f9cf95c
1 Parent(s): 79cf24f

chore: change heading level

Browse files
Files changed (1) hide show
  1. rag/rag.py +2 -2
rag/rag.py CHANGED
@@ -55,7 +55,7 @@ Here's an example of a question and an answer. You must use this as a template t
55
  What was the main mix of the training data ? How much data was used to train the model ?
56
  </question>
57
 
58
- ### Answer
59
  The main mix of the training data for the Llama 3 405 billion parameter model is as follows:
60
 
61
  - **General knowledge**: 50%
@@ -65,7 +65,7 @@ The main mix of the training data for the Llama 3 405 billion parameter model is
65
 
66
  Regarding the amount of data used to train the model, the snippets do not provide a specific total volume of data in terms of tokens or bytes. However, they do mention that the model was pre-trained on a large dataset containing knowledge until the end of 2023[^2^]. Additionally, the training process involved pre-training on 2.87 trillion tokens before further adjustments[^3^].
67
 
68
- ### Footnotes
69
 
70
  [^1^]: "Scaling Laws for Data Mix," page 6.
71
  [^2^]: "Pre-Training Data," page 4.
 
55
  What was the main mix of the training data ? How much data was used to train the model ?
56
  </question>
57
 
58
+ ## Answer
59
  The main mix of the training data for the Llama 3 405 billion parameter model is as follows:
60
 
61
  - **General knowledge**: 50%
 
65
 
66
  Regarding the amount of data used to train the model, the snippets do not provide a specific total volume of data in terms of tokens or bytes. However, they do mention that the model was pre-trained on a large dataset containing knowledge until the end of 2023[^2^]. Additionally, the training process involved pre-training on 2.87 trillion tokens before further adjustments[^3^].
67
 
68
+ ## Footnotes
69
 
70
  [^1^]: "Scaling Laws for Data Mix," page 6.
71
  [^2^]: "Pre-Training Data," page 4.