K2-Spike-1 / README.md
victormiller's picture
Update README.md
b3e72c7 verified
|
raw
history blame
2.46 kB
---
license: apache-2.0
---
# LLM360 Research Suite: K2 Loss Spike 1
We encountered two major loss spikes while [training K2](https://huggingface.co/LLM360/K2).
* The first loss spike occurred after X checkpoints and lasted over ~34 checkpoints. We restarted training at checkpoint X and training returned to normal.
* The [second loss spike](https://huggingface.co/LLM360/K2-Spike-2/) occurred after restarting training to fix the first loss spike at checkpoint X and lasted from ~8 checkpoints.
We are releasing these checkpoints so others can study this interesting phenomena in large model training.
<img src="loss_spike.png" alt="k2 loss spikes"/>
# Purpose
Loss spikes are still a relatively unknown phenomena. By making these spikes and associated training details available, we hope others use these artifacts to further the worlds knowledge on this topic.
## First 10 Checkpoints
| Checkpoints | |
| ----------- | ----------- |
| [Checkpoint 160](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_160) | [Checkpoint 170](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_170) |
| [Checkpoint 162](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_162) | [Checkpoint 172](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_172) |
| [Checkpoint 164](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_164) | [Checkpoint 174](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_174) |
| [Checkpoint 166](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_166) | [Checkpoint 176](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_176) |
| [Checkpoint 168](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_168) | [Checkpoint 178](https://huggingface.co/LLM360/K2-Spike-1/tree/spike_ckpt_178) |
[to find all branches: git branch -a]
## Loss Spike's on the LLM360 Evaluation Suite
View all the evaluations on our [Weights & Biases here](https://wandb.ai/llm360/K2?nw=7bxe4sz0vv)
## About the LLM360 Research Suite
The LLM360 Research Suite is a comprehensive set of large language model (LLM) artifacts from Amber, CrystalCoder, and K2 for academic and industry researchers to explore LLM training dynamics. Additional resources can be found at llm360.ai.
## Citation
**BibTeX:**
```bibtex
@misc{
title={LLM360-K2-65B: Scaling Up Open and Transparent Language Models},
author={The LLM360 Team},
year={2024},
}
```