nicholasKluge
commited on
Commit
•
ab03a05
1
Parent(s):
0794ac2
Update README.md
Browse files
README.md
CHANGED
@@ -92,7 +92,7 @@ These are the main arguments used in the training of this model:
|
|
92 |
| adam epsilon | 0.00000001 |
|
93 |
| weight decay | 0.01 |
|
94 |
| scheduler type | "cosine" |
|
95 |
-
| warmup
|
96 |
| gradient checkpointing | false |
|
97 |
| seed | 42 |
|
98 |
| mixed precision | 'no' |
|
@@ -101,7 +101,7 @@ These are the main arguments used in the training of this model:
|
|
101 |
|
102 |
## Intended Uses
|
103 |
|
104 |
-
The primary intended use of TeenyTinyLlama is research
|
105 |
|
106 |
## Basic usage
|
107 |
|
|
|
92 |
| adam epsilon | 0.00000001 |
|
93 |
| weight decay | 0.01 |
|
94 |
| scheduler type | "cosine" |
|
95 |
+
| warmup steps | 50000 |
|
96 |
| gradient checkpointing | false |
|
97 |
| seed | 42 |
|
98 |
| mixed precision | 'no' |
|
|
|
101 |
|
102 |
## Intended Uses
|
103 |
|
104 |
+
The primary intended use of TeenyTinyLlama is to research the behavior, functionality, and limitations of large language models. Checkpoints saved during training are intended to provide a controlled setting for performing scientific experiments. You may also further fine-tune and adapt TeenyTinyLlama-162m for deployment, as long as your use is in accordance with the Apache 2.0 license. If you decide to use pre-trained TeenyTinyLlama-162 as a basis for your fine-tuned model, please conduct your own risk and bias assessment.
|
105 |
|
106 |
## Basic usage
|
107 |
|