Update README.md
Browse files
README.md
CHANGED
@@ -68,7 +68,7 @@ Around 100B tokens from a mixture of the following corpora were used for the con
|
|
68 |
- [Japanese mc4](https://huggingface.co/datasets/mc4)
|
69 |
- [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz)
|
70 |
- [Japanese OSCAR](https://oscar-project.github.io/documentation/)
|
71 |
-
- [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B)
|
72 |
|
73 |
|
74 |
## Use and Limitations
|
|
|
68 |
- [Japanese mc4](https://huggingface.co/datasets/mc4)
|
69 |
- [Japanese CC-100](http://data.statmt.org/cc-100/ja.txt.xz)
|
70 |
- [Japanese OSCAR](https://oscar-project.github.io/documentation/)
|
71 |
+
- [SlimPajama](https://huggingface.co/datasets/cerebras/SlimPajama-627B) without the Books3 subset
|
72 |
|
73 |
|
74 |
## Use and Limitations
|