language:
- ko
- en
pipeline_tag: text-generation
inference: false
tags:
- solar
- mistral
- pytorch
- solar-ko
library_name: transformers
license: apache-2.0
base_model: upstage/SOLAR-10.7B-v1.0
Update Log
- 2024.07.01: Released Solar-Ko-Recovery & Uploaded Benchmark scores
- 2024.05.16: Preview Released Solar-Ko-Recovery
Solar-Ko-Recovery-11B 🌟❤️🩹
Solar-Ko-Recovery-11B aimed to recover Solar's capability on Korean with re-arrange of Embeddings and LM head, featuring an expanded vocabulary and the inclusion of a Korean+English corpus for enhanced representation.
Model Details
Model Developers: Junbum Lee (Beomi)
Variations: Solar-Ko-Recovery is available with one parameter sizes — 11B(10.99B🤣).
Input: The model accepts only text input.
Output: The model produces text output exclusively.
Model Architecture:
Solar-Ko-Recovery is an auto-regressive language model that leverages an optimized transformer architecture derived from Llama-2.
Training Data | Parameters | Content Length | GQA | Tokens | Learning Rate | |
---|---|---|---|---|---|---|
Solar-Ko-Recovery | A curated mix of Korean+English Corpora | 11B(10.99B) | 4k | O | >100B* | 5e-5 |
NOTE: 2-step training processed
- Only Embedding layer and LM Head layer are trained
- Full params trained
Vocab Expansion
Vocab expansion is conducted on edited upstage/solar-1-mini-tokenizer, which is superset of Solar tokenizer.
Model Name | Vocabulary Size | Description |
---|---|---|
Original Solar | 32000 | Sentencepiece BPE |
solar-1-mini-tokenizer | 64000 | Sentencepiece BPE. Added Ko/JP vocabs |
Tokenizing "안녕하세요, 오늘은 날씨가 좋네요."
- SOLAR-10.7B: 26 tokens
- Solar-Ko-Recovery: 7 tokens
Model | Tokens |
---|---|
SOLAR-10.7B | ['▁', '안', '<0xEB>', '<0x85>', '<0x95>', '하', '세', '요', ',', '▁', '오', '<0xEB>', '<0x8A>', '<0x98>', '은', '▁', '날', '<0xEC>', '<0x94>', '<0xA8>', '가', '▁', '좋', '네', '요', '.'] |
Solar-Ko-Recovery | ['▁안녕하세요', ',', '▁오늘은', '▁날씨가', '▁좋', '네요', '.'] |
Tokenizing "Meet 10.7B Solar: Elevating Performance with Upstage Depth UP Scaling!"
- SOLAR-10.7B: 22 tokens
- Solar-Ko-Recovery: 22 tokens
Model | Tokens |
---|---|
SOLAR-10.7B | ['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!'] |
Solar-Ko-Recovery | ['▁Meet', '▁', '1', '0', '.', '7', 'B', '▁Solar', ':', '▁E', 'lev', 'ating', '▁Performance', '▁with', '▁Up', 'stage', '▁Dep', 'th', '▁UP', '▁Scal', 'ing', '!'] |
LICENSE
Apache 2.0
Model Benchmark
LM Eval Harness - Korean
- Used EleutherAI's lm-evaluation-harness
- 5-shot scores
Tasks | Metric | Value | Stderr | |
---|---|---|---|---|
haerae | acc_norm | 0.7874 | ± | 0.0118 |
- haerae_general_knowledge | acc | 0.5000 | ± | 0.0378 |
- haerae_history | acc | 0.8723 | ± | 0.0244 |
- haerae_loan_word | acc | 0.8402 | ± | 0.0283 |
- haerae_rare_word | acc | 0.8346 | ± | 0.0185 |
- haerae_standard_nomenclature | acc | 0.8301 | ± | 0.0305 |
kmmlu_direct | exact_match | 0.4205 | ± | 0.0026 |
- kmmlu_direct_accounting | exact_match | 0.3700 | ± | 0.0485 |
- kmmlu_direct_agricultural_sciences | exact_match | 0.3140 | ± | 0.0147 |
- kmmlu_direct_aviation_engineering_and_maintenance | exact_match | 0.3870 | ± | 0.0154 |
- kmmlu_direct_biology | exact_match | 0.3510 | ± | 0.0151 |
- kmmlu_direct_chemical_engineering | exact_match | 0.3910 | ± | 0.0154 |
- kmmlu_direct_chemistry | exact_match | 0.4000 | ± | 0.0200 |
- kmmlu_direct_civil_engineering | exact_match | 0.4010 | ± | 0.0155 |
- kmmlu_direct_computer_science | exact_match | 0.6520 | ± | 0.0151 |
- kmmlu_direct_construction | exact_match | 0.3080 | ± | 0.0146 |
- kmmlu_direct_criminal_law | exact_match | 0.3100 | ± | 0.0328 |
- kmmlu_direct_ecology | exact_match | 0.4660 | ± | 0.0158 |
- kmmlu_direct_economics | exact_match | 0.5385 | ± | 0.0439 |
- kmmlu_direct_education | exact_match | 0.6200 | ± | 0.0488 |
- kmmlu_direct_electrical_engineering | exact_match | 0.3000 | ± | 0.0145 |
- kmmlu_direct_electronics_engineering | exact_match | 0.4740 | ± | 0.0158 |
- kmmlu_direct_energy_management | exact_match | 0.3560 | ± | 0.0151 |
- kmmlu_direct_environmental_science | exact_match | 0.2980 | ± | 0.0145 |
- kmmlu_direct_fashion | exact_match | 0.4470 | ± | 0.0157 |
- kmmlu_direct_food_processing | exact_match | 0.3690 | ± | 0.0153 |
- kmmlu_direct_gas_technology_and_engineering | exact_match | 0.3000 | ± | 0.0145 |
- kmmlu_direct_geomatics | exact_match | 0.3820 | ± | 0.0154 |
- kmmlu_direct_health | exact_match | 0.5700 | ± | 0.0498 |
- kmmlu_direct_industrial_engineer | exact_match | 0.3830 | ± | 0.0154 |
- kmmlu_direct_information_technology | exact_match | 0.6090 | ± | 0.0154 |
- kmmlu_direct_interior_architecture_and_design | exact_match | 0.5440 | ± | 0.0158 |
- kmmlu_direct_korean_history | exact_match | 0.3800 | ± | 0.0488 |
- kmmlu_direct_law | exact_match | 0.4670 | ± | 0.0158 |
- kmmlu_direct_machine_design_and_manufacturing | exact_match | 0.3960 | ± | 0.0155 |
- kmmlu_direct_management | exact_match | 0.5030 | ± | 0.0158 |
- kmmlu_direct_maritime_engineering | exact_match | 0.4283 | ± | 0.0202 |
- kmmlu_direct_marketing | exact_match | 0.7460 | ± | 0.0138 |
- kmmlu_direct_materials_engineering | exact_match | 0.4020 | ± | 0.0155 |
- kmmlu_direct_math | exact_match | 0.2867 | ± | 0.0262 |
- kmmlu_direct_mechanical_engineering | exact_match | 0.3490 | ± | 0.0151 |
- kmmlu_direct_nondestructive_testing | exact_match | 0.3760 | ± | 0.0153 |
- kmmlu_direct_patent | exact_match | 0.3700 | ± | 0.0485 |
- kmmlu_direct_political_science_and_sociology | exact_match | 0.5300 | ± | 0.0289 |
- kmmlu_direct_psychology | exact_match | 0.4470 | ± | 0.0157 |
- kmmlu_direct_public_safety | exact_match | 0.3520 | ± | 0.0151 |
- kmmlu_direct_railway_and_automotive_engineering | exact_match | 0.3220 | ± | 0.0148 |
- kmmlu_direct_real_estate | exact_match | 0.4350 | ± | 0.0351 |
- kmmlu_direct_refrigerating_machinery | exact_match | 0.3240 | ± | 0.0148 |
- kmmlu_direct_social_welfare | exact_match | 0.4970 | ± | 0.0158 |
- kmmlu_direct_taxation | exact_match | 0.3800 | ± | 0.0344 |
- kmmlu_direct_telecommunications_and_wireless_technology | exact_match | 0.5480 | ± | 0.0157 |
kobest_boolq | acc | 0.9202 | ± | 0.0072 |
f1 | 0.9202 | ± | N/A | |
kobest_copa | acc | 0.8680 | ± | 0.0107 |
f1 | 0.8678 | ± | N/A | |
kobest_hellaswag | acc | 0.5560 | ± | 0.0222 |
f1 | 0.5520 | ± | N/A | |
acc_norm | 0.6540 | ± | 0.0213 | |
kobest_sentineg | acc | 0.9824 | ± | 0.0066 |
f1 | 0.9824 | ± | N/A |
Citation
TBD
Acknowledgements
- Training support was provided by the TPU Research Cloud program.