---
datasets:
- danielpark/gorani-100k-llama2-13b-instruct
language:
- en
library_name: transformers
pipeline_tag: text-generation
---

# GORANI 100k

- Model: [danielpark/gorani-100k-llama2-13b-instruct](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct)
- Dataset: [TEAMGORANI/gorani-100k](https://huggingface.co/datasets/TEAMGORANI/gorani-100k)

## Caution
The model weights and dataset have not been properly curated yet and are strictly prohibited for use under any license. In relation to this, the developers do not assume any responsibility, either implicitly or explicitly.


## Updates
| Revision       | Commit Hash                                                 | Updated   | Train Process   | Status        |
| ---------------|------------------------------------------------------------|------------|------------------|---------------|
| Revision 1     | [6d30494fa8da84128499d55075eef57094336d03](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct/commit/6d30494fa8da84128499d55075eef57094336d03) | 23.10.04  | 19740/100000     | On Training   |


Revision 1: [6d30494fa8da84128499d55075eef57094336d03](https://huggingface.co/danielpark/gorani-100k-llama2-13b-instruct/commit/6d30494fa8da84128499d55075eef57094336d03)

<details>
  <summary>See details</summary>
  
| **Training Process**                                   |                               |
|----------------------------------------------|-------------------------------|
| Tokenizer Used                               | LlamaTokenizerFast            |
| Training Progress (Epoch 3.15/16)            |                               |
| Step         | 19740/100000           |
| Google Colab Resource Usage                  | 150 tokens used               |


| **System Information** |            |            |
|------------------------|------------|------------|
|                        | **Used**   | **Total**  |
| System RAM             | 5.8 GB     | 83.5 GB    |
| GPU RAM                | 26.6 GB    | 40.0 GB    |
| Disk                   | 74.0 GB    | 166.8 GB   |


| **Basic Training Settings** |                                 |
|-----------------------------|---------------------------------|
| local_rank                  | -1                              |
| per_device_train_batch_size | 4                               |
| per_device_eval_batch_size  | 1                               |
| gradient_accumulation_steps | 4                               |
| learning_rate               | 2e-4                            |
| max_grad_norm               | 0.3                             |
| weight_decay                | 0.001                           |
| max_seq_length              | 2048                            |
| num_train_epochs            | 1                               |
| max_steps                   | 100000                          |
| warmup_ratio                | 0.03                            |
| save_steps                  | 500000                          |
| logging_steps               | 10000                           |

| **4-bit Precision Settings** |                                 |
|-----------------------------|---------------------------------|
| use_4bit                    | True                            |
| use_nested_quant            | False                           |
| bnb_4bit_compute_dtype      | "bfloat16"                      |
| bnb_4bit_quant_type         | "nf4"                           |

| **LoRA Settings**           |                                 |
|-----------------------------|---------------------------------|
| lora_alpha                  | 16                              |
| lora_dropout                | 0.1                             |
| lora_r                      | 64                              |

| **Advanced Training Flags** |                                 |
|-----------------------------|---------------------------------|
| fp16                        | False                           |
| bf16                        | False                           |
| packing                     | False                           |
| gradient_checkpointing       | True                            |
| optim                       | "paged_adamw_32bit"             |
| lr_scheduler_type           | "constant"                      |
| group_by_length             | True                            |

| **GPU Configuration**       |                                 |
|-----------------------------|---------------------------------|
| device_map                  | {"": 0}                         |


</details>

<br>

## Check
- After checking the performance on the open LLM leaderboard for the 19.7k model, proceed with the following process
- Compare max sequence length 512 and 1024 (experiment with a 10k model).