update app.py
Browse files
app.py
CHANGED
@@ -18,7 +18,7 @@ some cases we report the loss for specific tokens within the context.
|
|
18 |
|
19 |
• C ≈ 6ND – an estimate of the total non-embedding training compute
|
20 |
|
21 |
-
$$E=1.69, A=406.4, \\alpha=0.34, \\beta=0.28$$
|
22 |
$$C\\approx6DN$$
|
23 |
$$L(N,D)=E+\\frac{A}{N^\\alpha}+\\frac{B}{D^\\beta}$$
|
24 |
$$N_{opt}(C),D_{opt}(C)={\\arg\\min}_{N,D\ s.t.\ FLOP/s(N,D)=C}\ L(N,D)$$
|
|
|
18 |
|
19 |
• C ≈ 6ND – an estimate of the total non-embedding training compute
|
20 |
|
21 |
+
$$E=1.69, A=406.4, B=410.7, \\alpha=0.34, \\beta=0.28$$
|
22 |
$$C\\approx6DN$$
|
23 |
$$L(N,D)=E+\\frac{A}{N^\\alpha}+\\frac{B}{D^\\beta}$$
|
24 |
$$N_{opt}(C),D_{opt}(C)={\\arg\\min}_{N,D\ s.t.\ FLOP/s(N,D)=C}\ L(N,D)$$
|