Tianhua commited on
Commit
f692d0d
1 Parent(s): 36ae419

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -3
README.md CHANGED
@@ -99,7 +99,7 @@ gen_tokens = model.generate(input_ids, do_sample=True, max_length=400)
99
  print("-"*20 + "Output for model" + 20 * '-')
100
  print(tokenizer.batch_decode(gen_tokens)[0])
101
  ```
102
- ## CrystalChat DataMix
103
  | Subset | Tokens (Billion) |
104
  | ----------- | ----------- |
105
  | OASST1-guanaco | 4.46 |
@@ -114,13 +114,12 @@ print(tokenizer.batch_decode(gen_tokens)[0])
114
  | HTML Instruction | 43.67 |
115
  | General Textbooks | 85.59 |
116
  | Programming Books | 395.63 |
117
- | Total | 1102.52 |
118
 
119
  # Evaluation
120
 
121
  Coming Soon!
122
 
123
-
124
  # Bias, Risks, and Limitations
125
  CrystalChat has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). The training data is known and made available [here](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets). It primarily consists of SlimPajama, StarCoder, and WebCrawl dataset.
126
 
 
99
  print("-"*20 + "Output for model" + 20 * '-')
100
  print(tokenizer.batch_decode(gen_tokens)[0])
101
  ```
102
+ <!-- ## CrystalChat DataMix
103
  | Subset | Tokens (Billion) |
104
  | ----------- | ----------- |
105
  | OASST1-guanaco | 4.46 |
 
114
  | HTML Instruction | 43.67 |
115
  | General Textbooks | 85.59 |
116
  | Programming Books | 395.63 |
117
+ | Total | 1102.52 | -->
118
 
119
  # Evaluation
120
 
121
  Coming Soon!
122
 
 
123
  # Bias, Risks, and Limitations
124
  CrystalChat has not been aligned to human preferences for safety within the RLHF phase or deployed with in-the-loop filtering of responses like ChatGPT, so the model can produce problematic outputs (especially when prompted to do so). The training data is known and made available [here](https://huggingface.co/datasets/LLM360/CrystalCoderDatasets). It primarily consists of SlimPajama, StarCoder, and WebCrawl dataset.
125