Update README.md
Browse files
README.md
CHANGED
@@ -23,7 +23,7 @@ This agent facilitates the generation of high-quality, cohesive, and captivating
|
|
23 |
- **Context window length:** 4096 tokens
|
24 |
|
25 |
### Training details
|
26 |
-
Training was performed on a GPU cluster of 64xH100s
|
27 |
|
28 |
### Learn more
|
29 |
- **Blogpost:** [GOAT-Storytelling: Arbitrarily Long Story Writing Agent](https://www.blog.goat.ai/goat-st/)
|
|
|
23 |
- **Context window length:** 4096 tokens
|
24 |
|
25 |
### Training details
|
26 |
+
Training was performed on a GPU cluster of 64xH100s. FSDP ZeRO-3 sharding is employed for efficient training. We instruction finetune on a dataset of 18K examples for one epoch with batch size of 336, AdamW optimizer with learning rate 1e-5.
|
27 |
|
28 |
### Learn more
|
29 |
- **Blogpost:** [GOAT-Storytelling: Arbitrarily Long Story Writing Agent](https://www.blog.goat.ai/goat-st/)
|