File size: 6,353 Bytes
49aef49 6d30494 49aef49 1d2066a 0887647 1d2066a 9429f07 c6ab15b 0b84a13 c6ab15b 631bfb3 9429f07 f7d38ee 49aef49 6d30494 0700430 49aef49 a4013ca 7bce14b 49aef49 7bce14b f7d38ee 691aa18 a080a2e 3b90df6 6fa0828 3b56301 6fa0828 c168a2a 3e3680a 404f22a dd920f3 f7d38ee 49aef49 f7d38ee 49aef49 38160f3 f7d38ee 7bce14b |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 |
---
datasets:
- danielpark/gorani-100k-llama2-13b-instruct
language:
- en
library_name: transformers
pipeline_tag: text-generation
---
# Project is on process. Do not use weight and dataset.
## Status: 19.7k check point weights open, waiting for the results on the LLM leaderboard.
| Update Schedule | Task Description | Status |
|-----------------|----------------------------|--------|
| 23-10-5 | Completed training - 20k 13b weight | Done |
| 23-10-6 | Submitted hf model weights | Done |
| 23-10-20 | QC | On Process |
| 23-10-13 | Completed training - 50k 13b weight | |
| 23-10-14 | Submitted hf model weights | |
| 23-10-18 | Completed training - 100k 13b weight | |
| 23-10-20 | QA | |
| 23-10-21 | Official weight release | |
# GORANI 100k
- Model: [danielpark/gorani-100k-llama2-13b-instruct](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct)
- Dataset: [danielpark/gorani-100k](https://huggingface.co/danielpark/gorani-100k)
## Template
I use llama2-13b with LFM, but I have used it without a default system message. If a system message is specified in some datasets, I use that content.
```
### System:
{System}
### User:
{New_User_Input}
### Input:
{New User Input}
### Response:
{New_Assistant_Answer}
```
## Caution
The model weights and dataset have not been properly curated yet and are strictly prohibited for use under any license. In relation to this, the developers do not assume any responsibility, either implicitly or explicitly.
## Updates
| Revision | Commit Hash | Updated | Train Process | Status |
| ---------------|------------------------------------------------------------|------------|------------------|---------------|
| Revision 01 | [6d30494fa8da84128499d55075eef57094336d03](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct/commit/6d30494fa8da84128499d55075eef57094336d03) | 23.10.04 | 19,740/100,000 | On Training |
## Training Plan
- After checking the performance on the open LLM leaderboard for the 19.7k model, proceed with the following process
- Compare max sequence length 512 and 1024 (experiment with a 10k model).
- Implementation of the content similar to the llama2 paper, which is more than 20 times slower than the initial stage.
- Code modification using flash attention 2.
- Dataset refinement and adding hash for freezing.
<br>
## Revision Infomations
### # Revision 01: [6d30494fa8da84128499d55075eef57094336d03](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct/commit/6d30494fa8da84128499d55075eef57094336d03)
- 19.74k fine-tuned model weight
- max_seq_length = 2048, partially modified tokenizer(related with pad token), default train param, tokenizer need to be fixed (refer 10k tokenizer)
<details>
<summary>See details</summary>
| **Training Process** | |
|----------------------------------------------|-------------------------------|
| Tokenizer Used | LlamaTokenizerFast |
| Training Progress (Epoch 3.15/16) | |
| Step | 19740/100000 |
| Google Colab Resource Usage | 150 tokens used |
| **System Information** | | |
|------------------------|------------|------------|
| | **Used** | **Total** |
| System RAM | 5.8 GB | 83.5 GB |
| GPU RAM | 26.6 GB | 40.0 GB |
| Disk | 74.0 GB | 166.8 GB |
| **Basic Training Settings** | |
|-----------------------------|---------------------------------|
| local_rank | -1 |
| per_device_train_batch_size | 4 |
| per_device_eval_batch_size | 1 |
| gradient_accumulation_steps | 4 |
| learning_rate | 2e-4 |
| max_grad_norm | 0.3 |
| weight_decay | 0.001 |
| max_seq_length | 2048 |
| num_train_epochs | 1 |
| max_steps | 100000 |
| warmup_ratio | 0.03 |
| save_steps | 500000 |
| logging_steps | 10000 |
| **4-bit Precision Settings** | |
|-----------------------------|---------------------------------|
| use_4bit | True |
| use_nested_quant | False |
| bnb_4bit_compute_dtype | "bfloat16" |
| bnb_4bit_quant_type | "nf4" |
| **LoRA Settings** | |
|-----------------------------|---------------------------------|
| lora_alpha | 16 |
| lora_dropout | 0.1 |
| lora_r | 64 |
| **Advanced Training Flags** | |
|-----------------------------|---------------------------------|
| fp16 | False |
| bf16 | False |
| packing | False |
| gradient_checkpointing | True |
| optim | "paged_adamw_32bit" |
| lr_scheduler_type | "constant" |
| group_by_length | True |
| **GPU Configuration** | |
|-----------------------------|---------------------------------|
| device_map | {"": 0} |
</details>
|