myeongho-jeong commited on
Commit
7c69004
1 Parent(s): f04eba6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -5
README.md CHANGED
@@ -63,11 +63,6 @@ Keep in mind that this model hasn't been fine-tuned with instruction-based train
63
 
64
  Our model’s training was comprehensive and diverse:
65
 
66
- - **Data Sources:**
67
- - English to Korean paragraph pairs: 5.86%
68
- - Multi-lingual corpus (primarily English): 10.69%
69
- - Korean web content: 83.46%
70
-
71
  - **Vocabulary Expansion:**
72
  We meticulously selected 8,960 Korean tokens based on their frequency in our Korean web corpus. This process involved multiple rounds of tokenizer training, manual curation, and token frequency analysis, ensuring a rich and relevant vocabulary for our model.
73
 
 
63
 
64
  Our model’s training was comprehensive and diverse:
65
 
 
 
 
 
 
66
  - **Vocabulary Expansion:**
67
  We meticulously selected 8,960 Korean tokens based on their frequency in our Korean web corpus. This process involved multiple rounds of tokenizer training, manual curation, and token frequency analysis, ensuring a rich and relevant vocabulary for our model.
68