Text Generation
Transformers
Safetensors
English
Chinese
llama
conversational
text-generation-inference
Simingh commited on
Commit
cc9efae
Β·
verified Β·
1 Parent(s): 877e1a6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +14 -3
README.md CHANGED
@@ -46,13 +46,24 @@ datasets:
46
  | OpenCoder-1.5B-Instruct | 4K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
47
  | OpenCoder-8B-Instruct | 8K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
48
 
49
-
50
  ## 3. Datasets
51
 
 
 
 
 
 
 
 
 
 
 
52
  | Dataset | Num | Download |
53
  |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
54
- | OpenCoder-SFT-Stage1 | 4.21 M | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage1) |
55
- | OpenCoder-SFT-Stage2 | 375 K | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage2) |
 
 
56
 
57
 
58
  ## 4. Benchmarks
 
46
  | OpenCoder-1.5B-Instruct | 4K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-1.5B-Instruct) |
47
  | OpenCoder-8B-Instruct | 8K | πŸ€— [HuggingFace](https://huggingface.co/infly/OpenCoder-8B-Instruct) |
48
 
 
49
  ## 3. Datasets
50
 
51
+ ### Pre-training
52
+
53
+ | Dataset | Size | Download |
54
+ |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
55
+ | fineweb-code-corpus | 148 GB | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-code-corpus) |
56
+ | fineweb-math-corpus | 10 GB | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/fineweb-math-corpus) |
57
+
58
+
59
+ ### Post-training
60
+
61
  | Dataset | Num | Download |
62
  |:---------------------:|:---------------:|:-----------------------------------------------------------------------:|
63
+ | opencoder-sft-stage1 | 4.21 M | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage1) |
64
+ | opencoder-sft-stage2 | 375 K | πŸ€— [HuggingFace](https://huggingface.co/datasets/OpenCoder-LLM/opencoder-sft-stage2) |
65
+
66
+ **This is not the end; we are organizing the remaining data and uploading it progressively.**
67
 
68
 
69
  ## 4. Benchmarks