gradientai
/

Llama-3-70B-Instruct-Gradient-262k

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

forrest-gradient commited on May 3

Commit

e77cde0

•

1 Parent(s): 2f636d9

Update README.md

Files changed (1) hide show

README.md +1 -2

README.md CHANGED Viewed

@@ -42,13 +42,12 @@ For training data, we generate long contexts by augmenting [SlimPajama](https://
 | Initialize From          | Llama-3-70B-Instruct             | 65K            |
 | Sequence Length 2^N      | 16              | 18              |
 | RoPE theta               | 15,296,098      | 207,112,184     |
-| Batch Size               | 1               | 1               |
 | Gradient Accumulation Steps | 1           | 1               |
 | Steps                    | 20              | 25              |
 | Total Tokens             | 83,886,080      | 104,857,600     |
 | Learning rate            | 0.00002         | 0.00002         |
 | # GPUs                   | 512             | 512             |
-| Ring parallelism         | 64              | 16              |
 | GPU Type                 | NVIDIA L40S     | NVIDIA L40S     |
 | Minutes to Train (Wall)  | 100             | 170             |

 | Initialize From          | Llama-3-70B-Instruct             | 65K            |
 | Sequence Length 2^N      | 16              | 18              |
 | RoPE theta               | 15,296,098      | 207,112,184     |
+| Batch Size               | 64               | 16               |
 | Gradient Accumulation Steps | 1           | 1               |
 | Steps                    | 20              | 25              |
 | Total Tokens             | 83,886,080      | 104,857,600     |
 | Learning rate            | 0.00002         | 0.00002         |
 | # GPUs                   | 512             | 512             |
 | GPU Type                 | NVIDIA L40S     | NVIDIA L40S     |
 | Minutes to Train (Wall)  | 100             | 170             |