imone commited on
Commit
025fef8
1 Parent(s): a43f8a7

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -0
README.md CHANGED
@@ -3,3 +3,13 @@ license: other
3
  license_name: llama3
4
  license_link: LICENSE
5
  ---
 
 
 
 
 
 
 
 
 
 
 
3
  license_name: llama3
4
  license_link: LICENSE
5
  ---
6
+
7
+ The original Llama 3 8b (base) special token weights are zero, which might cause NaN gradients. This version re-initialized the weights of all the following special tokens to alleviate the problem.
8
+
9
+ ```
10
+ <|eot_id|>
11
+ <|start_header_id|>
12
+ <|end_header_id|>
13
+ ```
14
+
15
+ We set the weights of these tokens in `embed` and `lm_head` to be the mean of all other tokens.