JosephusCheung commited on
Commit
5275e6e
1 Parent(s): 7fbd7b5

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +19 -0
README.md CHANGED
@@ -5,6 +5,25 @@ language:
5
  - zh
6
  - ja
7
  - de
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8
  ---
9
  Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch
10
 
 
5
  - zh
6
  - ja
7
  - de
8
+ datasets:
9
+ - JosephusCheung/GuanacoDataset
10
+ - meta-math/MetaMathQA
11
+ - jondurbin/airoboros-3.1
12
+ - WizardLM/WizardLM_evol_instruct_V2_196k
13
+ - RyokoAI/ShareGPT52K
14
+ - RyokoAI/Fandom23K
15
+ - milashkaarshif/MoeGirlPedia_wikitext_raw_archive
16
+ - wikipedia
17
+ - wiki_lingua
18
+ - garage-bAInd/Open-Platypus
19
+ - LDJnr/Puffin
20
+ - BAAI/COIG
21
+ - TigerResearch/tigerbot-zhihu-zh-10k
22
+ - liwu/MNBVC
23
+ - teknium/openhermes
24
+ - CausalLM/Refined-Anime-Text
25
+ - microsoft/orca-math-word-problems-200k
26
+ - m-a-p/CodeFeedback-Filtered-Instruction
27
  ---
28
  Tokenizer is different from cohere - and chat template is ChatML - fully fine-tuned at 128K+ ~ 30M entries long, web crawl input, GPT-4-32k/3.5-16k output, synthetic dataset - 1 epoch
29