File size: 5,695 Bytes
49aef49
 
6d30494
49aef49
 
 
 
 
 
1d2066a
 
0887647
1d2066a
f7d38ee
49aef49
6d30494
0700430
49aef49
a4013ca
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7bce14b
 
49aef49
7bce14b
f7d38ee
 
691aa18
a080a2e
3b90df6
6fa0828
 
3b56301
 
 
6fa0828
 
 
 
c168a2a
3e3680a
404f22a
dd920f3
f7d38ee
 
 
49aef49
 
 
 
 
 
 
 
f7d38ee
 
49aef49
 
 
 
 
 
 
 
 
 
38160f3
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f7d38ee
 
 
7bce14b
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
---
datasets:
- danielpark/gorani-100k-llama2-13b-instruct
language:
- en
library_name: transformers
pipeline_tag: text-generation
---

# Project is on process. Do not use weight and dataset.

## Status: 19.7k check point weights open, waiting for the results on the LLM leaderboard.

# GORANI 100k

- Model: [danielpark/gorani-100k-llama2-13b-instruct](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct)
- Dataset: [danielpark/gorani-100k](https://huggingface.co/danielpark/gorani-100k)

## Template
I use llama2-13b with LFM, but I have used it without a default system message. If a system message is specified in some datasets, I use that content.
```
### System:
{System}

### User:
{New_User_Input}

### Input:
{New User Input}

### Response:
{New_Assistant_Answer}
```

## Caution
The model weights and dataset have not been properly curated yet and are strictly prohibited for use under any license. In relation to this, the developers do not assume any responsibility, either implicitly or explicitly.

## Updates
| Revision       | Commit Hash                                                 | Updated   | Train Process   | Status        |
| ---------------|------------------------------------------------------------|------------|------------------|---------------|
| Revision 01     | [6d30494fa8da84128499d55075eef57094336d03](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct/commit/6d30494fa8da84128499d55075eef57094336d03) | 23.10.04  | 19,740/100,000     | On Training   |

## Training Plan
- After checking the performance on the open LLM leaderboard for the 19.7k model, proceed with the following process
- Compare max sequence length 512 and 1024 (experiment with a 10k model).
- Implementation of the content similar to the llama2 paper, which is more than 20 times slower than the initial stage.
- Code modification using flash attention 2.
- Dataset refinement and adding hash for freezing.
<br>

## Revision Infomations

### # Revision 01: [6d30494fa8da84128499d55075eef57094336d03](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct/commit/6d30494fa8da84128499d55075eef57094336d03)
- 19.74k fine-tuned model weight
- max_seq_length = 2048, partially modified tokenizer(related with pad token), default train param, tokenizer need to be fixed (refer 10k tokenizer)

<details>
  <summary>See details</summary>
  
| **Training Process**                                   |                               |
|----------------------------------------------|-------------------------------|
| Tokenizer Used                               | LlamaTokenizerFast            |
| Training Progress (Epoch 3.15/16)            |                               |
| Step         | 19740/100000           |
| Google Colab Resource Usage                  | 150 tokens used               |




| **System Information** |            |            |
|------------------------|------------|------------|
|                        | **Used**   | **Total**  |
| System RAM             | 5.8 GB     | 83.5 GB    |
| GPU RAM                | 26.6 GB    | 40.0 GB    |
| Disk                   | 74.0 GB    | 166.8 GB   |




| **Basic Training Settings** |                                 |
|-----------------------------|---------------------------------|
| local_rank                  | -1                              |
| per_device_train_batch_size | 4                               |
| per_device_eval_batch_size  | 1                               |
| gradient_accumulation_steps | 4                               |
| learning_rate               | 2e-4                            |
| max_grad_norm               | 0.3                             |
| weight_decay                | 0.001                           |
| max_seq_length              | 2048                            |
| num_train_epochs            | 1                               |
| max_steps                   | 100000                          |
| warmup_ratio                | 0.03                            |
| save_steps                  | 500000                          |
| logging_steps               | 10000                           |

| **4-bit Precision Settings** |                                 |
|-----------------------------|---------------------------------|
| use_4bit                    | True                            |
| use_nested_quant            | False                           |
| bnb_4bit_compute_dtype      | "bfloat16"                      |
| bnb_4bit_quant_type         | "nf4"                           |

| **LoRA Settings**           |                                 |
|-----------------------------|---------------------------------|
| lora_alpha                  | 16                              |
| lora_dropout                | 0.1                             |
| lora_r                      | 64                              |

| **Advanced Training Flags** |                                 |
|-----------------------------|---------------------------------|
| fp16                        | False                           |
| bf16                        | False                           |
| packing                     | False                           |
| gradient_checkpointing       | True                            |
| optim                       | "paged_adamw_32bit"             |
| lr_scheduler_type           | "constant"                      |
| group_by_length             | True                            |

| **GPU Configuration**       |                                 |
|-----------------------------|---------------------------------|
| device_map                  | {"": 0}                         |


</details>