myeongho-jeong
commited on
Commit
•
f04eba6
1
Parent(s):
05009de
Update README.md
Browse files
README.md
CHANGED
@@ -35,7 +35,8 @@ To adapt foundational models from English to Korean, we use subword-based embedd
|
|
35 |
This approach progressively trains from input embeddings to full parameters, efficiently extending the model's vocabulary to include Korean.
|
36 |
Our method enhances the model's cross-linguistic applicability by carefully integrating new linguistic tokens, focusing on causal language modeling pre-training.
|
37 |
We leverage the inherent capabilities of foundational models trained on English to efficiently transfer knowledge and reasoning to Korean, optimizing the adaptation process.
|
38 |
-
|
|
|
39 |
|
40 |
Here’s an simplified code for our key approach:
|
41 |
|
@@ -91,8 +92,8 @@ This rigorous approach ensured a comprehensive and contextually rich Korean voca
|
|
91 |
## Citation
|
92 |
|
93 |
```
|
94 |
-
@misc{
|
95 |
-
title={Efficient and Effective Vocabulary Expansion
|
96 |
author={Seungduk Kim, Seungtaek Choi, Myeongho Jeong},
|
97 |
year={2024},
|
98 |
eprint={2402.XXXXX},
|
|
|
35 |
This approach progressively trains from input embeddings to full parameters, efficiently extending the model's vocabulary to include Korean.
|
36 |
Our method enhances the model's cross-linguistic applicability by carefully integrating new linguistic tokens, focusing on causal language modeling pre-training.
|
37 |
We leverage the inherent capabilities of foundational models trained on English to efficiently transfer knowledge and reasoning to Korean, optimizing the adaptation process.
|
38 |
+
|
39 |
+
For detail, please refer our technical report(TBU) - [Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models](https://arxiv.org).
|
40 |
|
41 |
Here’s an simplified code for our key approach:
|
42 |
|
|
|
92 |
## Citation
|
93 |
|
94 |
```
|
95 |
+
@misc{kim2024efficient,
|
96 |
+
title={Efficient and Effective Vocabulary Expansion Towards Multilingual Large Language Models},
|
97 |
author={Seungduk Kim, Seungtaek Choi, Myeongho Jeong},
|
98 |
year={2024},
|
99 |
eprint={2402.XXXXX},
|