victormiller commited on
Commit
0ae9d99
1 Parent(s): 407f850

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -26,15 +26,15 @@ The following data mix was used to train K2 and achieve results in line with Lla
26
  | uspto | 4.77B | 3x | 14.3B | 1.1% |
27
  | pubmed-central | 26B | 1x | 26B | 2% |
28
  | [redpajama.arxiv](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 27.3B | 1x | 27.3B | 2.1% |
29
- | starcoder.spm | 67.6B | 0.5x | 33.8B | 2.6% |
30
- | starcoder.fim | 67.6B | 0.5x | 33.8B | 2.6% |
31
  | [redpajama.stackexchange](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 61.1B | 1x | 61.1B | 4.7% |
32
- | starcoder | 132.6B | 0.5x | 66.3B | 5.1% |
33
- | pile-of-law | 76.7B | 1x | 76.7B | 5.9% |
34
  | [redpajama.book](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 80.6B | 1x | 80.6B | 6.2% |
35
  | s2orc | 107.9B | 1x | 107.9B | 8.3% |
36
  | [redpajama.wikipedia](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 22.1B | 6x | 132.6B | 10.2% |
37
- | refinedweb | 612.3B | 1x | 612.3B | 47.1% |
38
  | Totals | - | - | 1.3T | 100% |
39
 
40
  ## First 10 Checkpoints
 
26
  | uspto | 4.77B | 3x | 14.3B | 1.1% |
27
  | pubmed-central | 26B | 1x | 26B | 2% |
28
  | [redpajama.arxiv](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 27.3B | 1x | 27.3B | 2.1% |
29
+ | [starcoder.spm](https://huggingface.co/datasets/bigcode/starcoderdata) | 67.6B | 0.5x | 33.8B | 2.6% |
30
+ | [starcoder.fim](https://huggingface.co/datasets/bigcode/starcoderdata) | 67.6B | 0.5x | 33.8B | 2.6% |
31
  | [redpajama.stackexchange](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 61.1B | 1x | 61.1B | 4.7% |
32
+ | [starcoder](https://huggingface.co/datasets/bigcode/starcoderdata) | 132.6B | 0.5x | 66.3B | 5.1% |
33
+ | [pile-of-law](https://huggingface.co/datasets/pile-of-law/pile-of-law) | 76.7B | 1x | 76.7B | 5.9% |
34
  | [redpajama.book](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 80.6B | 1x | 80.6B | 6.2% |
35
  | s2orc | 107.9B | 1x | 107.9B | 8.3% |
36
  | [redpajama.wikipedia](https://huggingface.co/datasets/cerebras/SlimPajama-627B) | 22.1B | 6x | 132.6B | 10.2% |
37
+ | [refinedweb](https://huggingface.co/datasets/tiiuae/falcon-refinedweb) | 612.3B | 1x | 612.3B | 47.1% |
38
  | Totals | - | - | 1.3T | 100% |
39
 
40
  ## First 10 Checkpoints