yanolja
/

EEVE-Korean-10.8B-v1.0

Text Generation

Generated from Trainer

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

myeongho-jeong commited on Feb 23

Commit

7c69004

•

1 Parent(s): f04eba6

Update README.md

Files changed (1) hide show

README.md +0 -5

README.md CHANGED Viewed

@@ -63,11 +63,6 @@ Keep in mind that this model hasn't been fine-tuned with instruction-based train
 Our model’s training was comprehensive and diverse:
-- **Data Sources:**
-  - English to Korean paragraph pairs: 5.86%
-  - Multi-lingual corpus (primarily English): 10.69%
-  - Korean web content: 83.46%
 - **Vocabulary Expansion:**
   We meticulously selected 8,960 Korean tokens based on their frequency in our Korean web corpus. This process involved multiple rounds of tokenizer training, manual curation, and token frequency analysis, ensuring a rich and relevant vocabulary for our model.

 Our model’s training was comprehensive and diverse:
 - **Vocabulary Expansion:**
   We meticulously selected 8,960 Korean tokens based on their frequency in our Korean web corpus. This process involved multiple rounds of tokenizer training, manual curation, and token frequency analysis, ensuring a rich and relevant vocabulary for our model.