File size: 1,752 Bytes
823da3d 4a8383d 823da3d 655c748 823da3d 4a8383d |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
---
language:
- ko
tags:
- llama-2
- instruct
- instruction
pipeline_tag: text-generation
license: llama2
datasets:
- squarelike/OpenOrca-gugugo-ko
---
# Llama-2-ko-OpenOrca-gugugo-13B
This model was trained for PoC purposes. This is part of an experiment to check whether model performance improves when fine-tuned with large data of about 1 million samples.
[Note] There are still many people/customers who have the wrong idea that 'Always the more data, the better,' so I showed it directly with experimental data.
In fine-tuning, data quality is much more important than simply preparing a lot of data, and keyword distribution within the dataset is also important!
For example, when searching for process and comparison keywords in the kkullm dataset, each is about 1% of the entire dataset.
### Model Details
- Base Model: [beomi/llama-2-koen-13b](https://huggingface.co/beomi/llama-2-koen-13b)
### Datasets
Trained on 1 million samples from the dataset. The training infrastructure used AWS g5.12xlarge x 2ea (total of NVIDIA A10G 8 GPUs).
- [OpenOrca-gugugo-ko](https://huggingface.co/datasets/squarelike/OpenOrca-gugugo-ko)
### Hyperparameters
The hyperparameters are simply heuristic values. For reference only:
```python
learning_rate = 3e-5
lr_scheduler = "constant_with_warmup"
batch_size = 1
gradient_accumulation_steps = 8
lora_alpha = 16
lora_r = 16
lora_dropout = 0.1
lora_target_modules = "[gate_proj, down_proj, up_proj, q_proj, k_proj, o_proj, v_proj]"
use_flash_attention_2 = True
```
### License
- Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International Public License, under LLAMA 2 COMMUNITY LICENSE AGREEMENT
This model was created as a personal experiment, unrelated to the organization I work for. |