The license is cc-by-nc-sa-4.0.
- Commercializing is not allowed.
Not based on Synatra model, we pre-train and full-finetuning Mixtralx2 to enhance Korean abilities.
Developer
Seungyoo Lee (DopeorNope), Kyujin Han(kyujinpy)
DATASET.
Continuous pre-train was performed using AI hub corpus, and we applied instruct-tune using AI hub datasets.
Using a Self-supervised learning manner, we converted raw corpus to instruct tuned data.
We used text-mining techniques to create the train data.
Here is some examples...
Mask prediction Task
#Mask prediction
text='์ง๋ฅ(ๆบ่ฝ) ๋๋ ์ธํ
๋ฆฌ์ ์ค(intelligence)๋ ์ธ๊ฐ์ <MASK> ๋ฅ๋ ฅ์ ๋งํ๋ค.'
response='์ง์ '
complete_text='์ง๋ฅ(ๆบ่ฝ) ๋๋ ์ธํ
๋ฆฌ์ ์ค(intelligence)๋ ์ธ๊ฐ์ ์ง์ ๋ฅ๋ ฅ์ ๋งํ๋ค.'
- Text allign Task
#Text-allign Task
text_list=['๋ณต์๋ช
๋ น-๋ณต์์๋ฃ(MIMD,Multiple Instruction, Multiple Data)์ ์ ์ฐ์์ ๋ณ๋ ฌํ์ ํ ๊ธฐ๋ฒ์ด๋ค.',
'๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ ์๋ MPP(massively parallel processors)์ COW (Clusters of Workstations)์ด๋ค.',
'MIMD๊ธฐ๊ณ๋ ๊ณต์ ๋ฉ๋ชจ๋ฆฌ์ด๊ฑฐ๋ ๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋ฌํ ๋ถ๋ฅ๋ MIMD๊ฐ ์ด๋ป๊ฒ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ด์ฉํ๋๋์ ๋ฐ๋ผ ๋๋๋ค.']
response='๋ณต์๋ช
๋ น-๋ณต์์๋ฃ(MIMD,Multiple Instruction, Multiple Data)์ ์ ์ฐ์์ ๋ณ๋ ฌํ์ ํ ๊ธฐ๋ฒ์ด๋ค. \
MIMD๊ธฐ๊ณ๋ ๊ณต์ ๋ฉ๋ชจ๋ฆฌ์ด๊ฑฐ๋ ๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ด๋ฉฐ ์ด๋ฌํ ๋ถ๋ฅ๋ MIMD๊ฐ ์ด๋ป๊ฒ ๋ฉ๋ชจ๋ฆฌ๋ฅผ ์ด์ฉํ๋๋์ ๋ฐ๋ผ ๋๋๋ค. \
๋ถ์ฐ ๋ฉ๋ชจ๋ฆฌ์ ์๋ MPP(massively parallel processors)์ COW (Clusters of Workstations)์ด๋ค.'
- Text completion Task
#Text Completion
text= '๊ทธ๋ฆฐ๋ธ๋ผ์ฐ์ (GreenBrowser)๋ ์ธํฐ๋ท ์ต์คํ๋ก๋ฌ์์ ์ฌ์ฉํ๋ ํธ๋ผ์ด๋ํธ ๋ ์ด์์ ์์ง์ ๋ฐํ์ผ๋ก ํ๋ฉฐ ์ค๊ตญ์ ๊ธฐ๋ฐ์ ๋ ์ํํธ์จ์ด ํ์ฌ์ธ ๋ชจ์ดํต(morequick)์์ ๋ง๋ ๋ฌด๋ฃ ์น ๋ธ๋ผ์ฐ์ ๋ค. ๊ฐ์ฒด์ ์ค๊ตญ์ด๊ฐ ์น ๋ธ๋ผ์ฐ์ ์ ๋ด์ฅ๋์ด ์๋ค.
๋งฅ์คํค ์น ๋ธ๋ผ์ฐ์ ์ ๋น์ทํ์ฌ MyIE์ ๋ฐ์ ํ๊ฒ ๊ด๋ จ๋์ด ์๋ค. ๋งฅ์คํค์ฉ์ ์ผ๋ถ ํ๋ฌ๊ทธ์ธ์ด ๊ทธ๋ฆฐ๋ธ๋ผ์ฐ์ ์์๋ ์๋ํ ๊ฒ์ด๋ค.'
response= '์๋ ์คํฌ๋กค, ์๋ ๋ฆฌํ๋ ์, ์๋ ์ ์ฅ, ์๋ ํผ ์ฑ์ฐ๊ธฐ์ ๊ฐ์ ๋ง์ ์๋ํ ๊ธฐ๋ฅ์ด ์๋ค.'
Acknoledgement
Markr AI is in constant communication with numerous open-source developers and researchers. We would also like to express our gratitude to Beomi and Maywell, who have provided many insights through extensive discussions in the development of the model.
- Downloads last month
- 1,785
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.